Download presentation
Published byEleanor Lang Modified over 9 years ago
1
Compilers Computer Symbol Table Output Scanner (lexical analysis)
Syntactic/semantic structure tokens Syntactic structure Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Code Generator Source language Machine language Code Optimizer Input Data Computer Symbol Table Output
2
Interpreters Interpreter Source language Output Input Data
3
Hybrid Output Interpreter Symbol Table Scanner (lexical analysis)
tokens Syntactic structure Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Source language Input Data Intermediate Code Interpreter Symbol Table Output
4
Object Code Optimization
Source Program if (a >= b+1) { a *= 2; } … Lexical Analysis Front End (analysis) Syntax Analysis Semantic Analysis Intermediate Code Gen The Compilation Process _t1 = b + 1 _t2 = a < _t1 If _t2 go to L0 … Intermediate Representation IR Optimization Back End (synthesis) Object Code Gen Object Code Optimization lw $t1, -16($fp) Add $t0, $t1, 1 … Target Program
5
The Analysis Stage Broken up into four phases
Lexical Analysis (also called scanning or tokenization) Parsing Semantic Analysis Intermediate Code Generation
6
Lexical Analysis and Scanners/Lexers
Lexical analysis is the first phase of compilation where the compiler attempts to recognize the symbols of the actual source code Lexical analyzers also called scanners/lexers are usually subroutines or coroutines of the parser. The parser will ask for the next token from the source file and the lexer will return that token.
7
Lexing Example lexemes double d1; double d2; d2 = d1 * 2.0;
double TOK_DOUBLE reserved word d1 TOK_ID variable name ; TOK_PUNCT has value of “;” double TOK_DOUBLE reserved word d2 TOK_ID variable name = TOK_OPER has value of “=” d TOK_ID variable name * TOK_OPER has value of “*” TOK_FLOAT_CONST has value of 2.0 ; TOK_PUNCT has value of “;” lexemes
8
Lexical Analysis – sequences
Expression Base * base - 0x4 * height * width Token sequence Name:Base operator:times name:base operator:minus hexConstant:4 operator:times name:height operator:times name:width Lexical phase returns token and value
9
Tokens and lexemes Lexers work with patterns, tokens, and lexemes.
Patterns formally describe tokens in some way. Tokens are the terminal symbols in the grammar for the language. Lexemes are the actual strings that match the patterns
10
Example Token Lexeme Pattern Description int identifier MyVar
letter followed by digits or letters literal ``foo'' characters enclosed in quotes
11
Expressing Patterns for Tokens
As you may have already guessed (or know), the easiest way to specify a token is with a regular expression.
12
Regexs Regular expressions (regexs) are used to describe (regular) languages. Here are the rules of regular expressions: The empty string, , is a regular expression (e) A symbol is a regular expression (e.g., a) If R and S are regexs, then so is R|S (denoting R or S) RS (concatenation) R* (zero or more of R) (R) (grouping)
13
Regex Conventions There are various conventions used in the world of regular expressions to make things a bit easier. R+ (one or more of R) R? (zero or one of R) [a-z], [A-Z], [0-9] (character classes) . - any single character/symbol Precedence rules for operators to avoid excessive parenthesis. All operators group left-to-right. *, + and ? have highest concatenation is second highest | is the lowest
14
Examples a...b - five letter words starting with a and ending with b
a*(bb)*a* - words with an even number of b's. .*(ing|er)s? - words ending with ing or er, with zero or one s [0-9]+\.[0-9]+(e|E)-?[0-9](l|L|f|F)? - simplified version of floating point constants in C (the backslash (\) means ``take the next character literally'') (R|)* - equivalent to R*
15
Another Example Expression -> Expression + Expression |
... Variable | Constant | Variable -> T_IDENTIFIER Constant -> T_INTCONSTANT | T_DOUBLECONSTANT
16
The Parse a + 2 Expression -> Expression + Expression
-> Variable + Expression -> T_IDENTIFIER + Expression -> T_IDENTIFIER + Constant -> T_IDENTIFIER + T_INTCONSTANT
17
Semantic Analysis The syntactically correct parse tree (or derivation) is checked for semantic errors Check for constructs that while valid syntax do not obey the semantic rules of the source language. Examples: Use of an undeclared/un-initialized variable Function called with improper arguments Incompatible operands and type mismatches,
18
Most semantic analysis pertains to the checking of types.
Examples void fun1(int i); double d; d = fun1(2.1); int i; int j; i = i + 2; int arr[2], c; c = arr * 10; Most semantic analysis pertains to the checking of types.
19
Intermediate Code Generation
Where the intermediate representation of the source program is created. The representation can have a variety of forms, but a common one is called three-address code (TAC) Like assembly – the TAC is a sequence of simple instructions, each of which can have at most three operands.
20
Example _t1 = b * c a = b * c + b * d _t2 = b * d _t3 = _t1 + _t2
a = _t3 a = b * c + b * d Note temps
21
Another Example Note Temps Symbolic addresses _t1 = a > b
if _t1 goto L0 _t2 = a - c a = _t2 L0: t3 = b * c c = _t3 if (a <= b) a = a - c; c = b * c; Note Temps Symbolic addresses
22
Backend (Synthesis) Basic Steps
Intermediate Code optimization Object Code Generation Object Code Optimization Synthesis is not as deterministic/predictable as analysis. Thus, synthesis must be conservative and this is why optimizing can be lengthy and not ``perfect''.
23
Intermediate Code optimization
Input is IR, output is optimized IR What are some of the optimizations that can be performed? Algebraic simplifications (*1,/1,*0, factoring, etc) Moving invariant code out of loops Removal of isolated code and unused variables Removing variables that are not used
24
IR Optimization Optimizations take place with IR and when manipulating actual machine code. However the optimizations done at the IR stage can be done to any program, regardless of architecture. The optimizations done with machine/object code usually exploit some feature of the target architecture in some way What’s this say about a JITC approach?
25
Example _t1 = b * c _t2 = _t1 + 0 _t1 = b * c _t3 = b * c
a = _t4 _t1 = b * c _t2 = _t1 + t1 a = _t2
26
Object Code Generation
The output of this stage is machine or assembly code Variables get mapped to memory locations (Variables are just a shorthand for that anyway) Actual machine instructions are swapped for symbolic ones
27
Object Code Optimization
May follow code generation Optional – only on demand Variable Like IR Optimization may be expensive Levels Exploits machine detail Examples: Register pools Instruction Pipelining
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.