Download presentation
Presentation is loading. Please wait.
1
CPSC 325 - Compiler Tutorial 9 Review of Compiler
2
Compiler and compilation A high level programming language is usually described in terms of a grammar – Grammar specifies the form of, syntax, of legal statements in the language – Compilation = matching statements written by the programmer to structures defined by the grammar and generating the appropriate object code for each statement We can see a source program as a sequence of tokens – Keywords, variables, block, etc.
3
Lexical analysis/scanner The task of scanning the source statement, recognizing and classifying the various tokens Part of compiler that performs lexical analysis Help the parser to parse and make the parser run/work more efficiently
4
Parser Each statement in the program is recognized as some language construct, such as a declaration or an assignment statement described by grammar.
5
Symbol table/Analyzer (Optional) Build the symbol table and located the memory locations for the program. It can be a very messy task. Many different way to implement it The symbol table will be used through the whole program Once the location had been located, then we do NOT need the symbol table anymore
6
Code generator Generate the Object/Target code Sometimes the target/object code be optimize by the optimizer Note1: the optimizer is totally optional Note2: It is possible to compile a program in a single pass. Note3: Compilers that perform code optimization generally make several passes
7
Compiler ideas Compilers divide their problem into steps or passes to conquer it Initial pass takes the source program as input The last pass output the code for execution
8
Compiler A program or set of programs that translates one language into another
9
Passes Pass 1: Preprocessor – Macro and constitution – Strip Comments from source code Pass 2: Lexical analyzer, Parser, Code generator – Heart of the compiler – Translates source into a platform independent language much like assembler (Intermediate code)
10
Passes (cont.) Pass 3: Optimizer – Improves the quality of the intermediate code Pass 4: Back end – Translates the optimized code to real assembler language or directly to some binary executable code – Provides target independence for earlier phases
11
Lexical analyzer/Scanner Scanning the program to be compiled and recognizing the tokens that make up the source statements Converts the incoming source into a series of basic language elements – A = B +3 has 5 tokens. Tokens have meaning and are indivisible – In C, “while” is one token, you can’t say “wh” “and ile” – Can be placed into symbol table and have information associated with them Type, value, name, relationship to other structures Can be referenced by unique integer for later usage
12
Lexical Analyzer/Scanner (cont) Scanners are usually designed to recognize keywords, operators, and identifiers as well as integers, floating-point number and others The “Longest Match Rule” – which match the longest tokens in the library; if not otherwise stated. (For example >> is NOT > and > ) Variable are recognize as ONE token instead of many Characters
13
Lexical Analyzer/Scanner (cont) The output of the scanner is a sequence of token coding Token specifier: gives the identifier name, value, etc., that was found by scanner – Some scanner are designed to enter identifiers directly into a symbol table – Token specifier for an identifier might be a pointer to the symbol-table entry for that idnetifier
14
Parser The parser analyses the source grammatically to determine whether it meets the language specification and to develop a representation better suited to code generation Parser invokes the lexical analyzer to get the next token (reference into symbol table) and its corresponding lexeme Check the syntax of a sentence
15
Parser (cont)
17
To summarize – Parser breaks the token stream into a parse tree – Parse tree is a structural representation of the sentence or program being parsed
18
Analyzer and Symbol Table Omit – Since not everyone in the class do it Analyzer generate the symbol table for later use
19
Code Generator Last task of compilation generation of object code Most compilers generate the output of the code generator as the parse progresses instead of leaving it until after a parse tree is build Small part of the parse tree fill in code templates that are generated by the code generator
20
Code generator (cont) Code generator can generate – Executable – Advantage: fast – Some aspects of optimization can still take place by observing the final linear instruction stream OR – Intermediate language representation that is close to assembler but has additional information – Makes it easier for optimizers to perform further optimizations to generate faster code
21
Intermediate Language All code generation is machine dependent as we must know the instruction set of a computer to generate code for it Intermediate form: syntax and semantics of the source statements have been completely analyzed, but the actual translation into machine code has not yet been performed. Transportable: from one to the others. (Intel, Motorola, etc) Processed by interpreters (TM, JM – byte code)
22
BNF – Backus Naur Form Describe the grammars for language – Set ok tokens called terminal symbols For things like numbers, key words, predefined symbols – Set of definitions called non-terminal symbols For example: a := b | c (a is either b or c) – Definitions create a system in which every legal structure can be represented – Grammars are typically recursive, so recursion can be used to parse the grammar
23
BNF example
24
BNF Example (cont)
25
Summary Compilers can recognize when templates or objects are instantiated and destroyed – These are part of the language definition – Once the pattern is matched, it can output intermediate level code to support these operations Template parameters can be filled in Calls are made to appropriate routines to construct/destroy objects
26
Summary (cont) Interpreters – Give flexibility but are slower – Can modify the interpreted program on the fly and see the impact immediately without a regeneration of code – Interpretation can be at the source program level, or at an intermediate language level
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.