Download presentation
Presentation is loading. Please wait.
1
COMP 433 – Theory of Compilers (Level – 10)
Unit 1 – Introduction to Compilers Unit 2 – Syntax Analysis Unit 3 – Intermediate Code Generation Unit 4 – Code Generation Unit 5 – Code Optimization
2
Unit – 1 : Introduction To Compilers
Analysis of Source Program Phases of a compiler Cousins of Compilers Grouping of Phases Compiler construction tools Lexical Analysis Role of Lexical Analyzer Input Buffering Specification of Tokens
3
Definitions What is a compiler? What is an interpreter?
A program that accepts as input a program text in a certain language and produces as output a program text in another language, while preserving the meaning of that text (Grune et al, 2000). A program that reads a program written in one language (source language) and translates it into an equivalent program in another language (target language) (Aho et al) What is an interpreter? A program that reads a source program and produces the results of executing this source. We deal with compilers! Many of these issues arise with interpreters!
4
What is a Compiler? A program that translates a program in one language to another language The essential interface between applications & architectures Typically lowers the level of abstraction analyzes and reasons about the program & architecture We expect the program to be optimized, i.e., better than the original ideally exploiting architectural strengths and hiding weaknesses 4
5
Overview of Compilers Data Results Source program Object program
Compilation Process: Interpretive Process: Results Source program Object program Compiler Executing Computer Compile time run time Data Source program Result Compiler
6
What Do Compilers Do (1) A compiler acts as a translator, transforming human-oriented programming languages into computer-oriented machine languages. Ignore machine-dependent details for programmer Programming Language (Source) Machine Language (Target) Compiler
7
What Do Compilers Do (2) Compilers may generate three types of code:
Pure Machine Code Machine instruction set without assuming the existence of any operating system or library. Mostly being OS or embedded applications. Augmented Machine Code Code with OS routines and runtime support routines. More often Virtual Machine Code Virtual instructions, can be run on any architecture with a virtual machine interpreter or a just-in-time compiler Ex. Java
8
What Do Compilers Do (3) Another way that compilers differ from one another is in the format of the target machine code they generate: Assembly or other source format Relocatable binary Relative address A linkage step is required Absolute binary Absolute address Can be executed directly
9
Compiler vs. Interpreter (1/5)
Compilers: Translate a source (human-writable) program to an executable (machine-readable) program Interpreters: Convert a source program and execute it at the same time.
10
Compiler vs. Interpreter (2/5)
Ideal concept: Source code Executable Compiler Input data Executable Output data Source code Interpreter Output data Input data
11
Compiler vs. Interpreter (3/5)
Most languages are usually thought of as using either one or the other: Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1 Interpreters: Lisp, scheme, BASIC, APL, Perl, Python, Smalltalk BUT: not always implemented this way Virtual Machines (e.g., Java) Linking of executables at runtime JIT (Just-in-time) compiling
12
Compiler vs. Interpreter (4/5)
Actually, no sharp boundary between them. General situation is a combo: Translator Intermed. code Source code Intermed. code Virtual machine Output Input Data
13
Compiler vs. Interpreter (5/5)
Pros Less space Fast execution Cons Slow processing Partly Solved (Separate compilation) Debugging Improved thru IDEs Interpreter Pros Easy debugging Fast Development Cons Not for large projects Exceptions: Perl, Python Requires more space Slower execution Interpreter in memory all the time
14
Programs related to Compiler
15
Interpreters Execute the source program immediately rather than generating object code Examples: BASIC, LISP, used often in educational or development situations Speed of execution is slower than compiled code by a factor of 10 or more Share many of their operations with compilers
16
Assemblers A translator for the assembly language of a particular computer Assembly language is a symbolic form of one machine language A compiler may generate assembly language as its target language and an assembler finished the translation into object code
17
Linkers Collect separate object files into a directly executable file
Connect an object program to the code for standard library functions and to resource supplied by OS Becoming one of the principle activities of a compiler, depends on OS and processor
18
Loaders Resolve all re-locatable address relative to a given base
Make executable code more flexible Often as part of the operating environment, rarely as an actual separate program
19
Preprocessors Delete comments, include other files, and perform macro substitutions Required by a language (as in C) or can be later add-ons that provide additional facilities
20
Editors Compiler have been bundled together with editor and other programs into an interactive development environment (IDE) Oriented toward the format or structure of the programming language, called structure-based May include some operations of a compiler, informing some errors
21
Debuggers Used to determine execution error in a compiled program
Keep tracks of most or all of the source code information Halt execution at pre-specified locations called breakpoints Must be supplied with appropriate symbolic information by the compiler
22
Profiles Collect statistics on the behavior of an object program during execution Called Times for each procedures Percentage of execution time Used to improve the execution speed of the program
23
Project Managers Coordinate the files being worked on by different people, maintain coherent version of a program Language-independent or bundled together with a compiler Two popular project manager programs on Unix system Sccs (Source code control system) Rcs (revision control system) BACK
24
The Many Phases of a Compiler
Source Program 1 Lexical analyzer Analyses 2 Syntax Analyzer 3 Semantic Analyzer Intermediate Code Generator Symbol-table Manager 4 Error Handler 5 Code Optimizer Syntheses 6 Code Generator 7 Peephole Optimization 1, 2, 3, 4, 5 : Front-End 6, 7 : Back-End Target Program
25
Phase 1. Lexical Analysis
Easiest Analysis - Identify tokens which are the basic building blocks For Example: Position := initial + rate * 60 ; _______ __ _____ _ ___ _ __ _ All are tokens Blanks, Line breaks, etc. are scanned out
26
Phase 2. Syntax Analysis or Parsing
For example, we would have a Parse Tree: identifier expression number assignment statement position := + * 60 initial rate Nodes of tree are constructed using a grammar for the language
27
Phase 3. Semantic Analysis
Finds Semantic Errors One of the Most Important Activity in This Phase: Type Checking - Legality of Operands position initial rate := + * inttoreal 60 position initial rate := + * 60 Syntax Tree Conversion Action
28
Supporting Phases / Activities for Analysis
Symbol Table Creation / Maintenance Contains Info (storage, type, scope, args) on Each “Meaningful” Token, Typically Identifiers Data Structure Created / Initialized During Lexical Analysis Utilized / Updated During Later Analysis & Synthesis Error Handling Detection of Different Errors Which Correspond to All Phases What Happens When an Error Is Found?
29
The Synthesis Task For Compilation
Intermediate Code Generation Abstract Machine Version of Code - Independent of Architecture Easy to Produce and Do Final, Machine Dependent Code Generation Code Optimization Find More Efficient Ways to Execute Code Replace Code With More Optimal Statements Final Code Generation Generate Relocatable Machine Dependent Code Peephole Optimization With a Very Limited View Improves Produced Final Code
30
Reviewing the Entire Process
position := initial + rate * 60 lexical analyzer id1 := id2 + id3 * 60 syntax analyzer := id1 id2 id3 + * 60 semantic analyzer := id1 id2 id3 + * inttoreal 60 Symbol Table Errors position .... initial …. rate…. intermediate code generator
31
Reviewing the Entire Process
Symbol Table position .... initial …. rate…. Errors intermediate code generator t1 := inttoreal(60) t2 := id3 * t1 t3 := id2 + t2 id1 := t3 3 address code code optimizer t1 := id3 * 60.0 id1 := id2 + t1 final code generator MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R1, R2 MOVF R1, id1
32
The Phases of a Compiler
Output Sample Programmer (source code producer) Source string A=B+C; Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’ And symbol table with names Parser (performs syntax analysis based on the grammar of the programming language) Parse tree or abstract syntax tree ; | = / \ A / \ B C Semantic analyzer (type checking, etc) Annotated parse tree or abstract syntax tree Intermediate code generator Three-address code, quads, or RTL int2fp B t t1 C t2 := t A Optimizer int2fp B t t1 #2.3 A Code generator Assembly code MOVF #2.3,r1 ADDF2 r1,r2 MOVF r2,A Peephole optimizer ADDF2 #2.3,r2 MOVF r2,A
33
The Structure of a Compiler (1)
Any compiler must perform two major tasks Analysis of the source program Synthesis of a machine-language program Compiler Analysis Synthesis
34
The Structure of a Compiler (2)
Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code
35
The Structure of a Compiler (3)
Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Scanner The scanner begins the analysis of the source program by reading the input, character by character, and grouping characters into individual words and symbols (tokens) RE ( Regular expression ) NFA ( Non-deterministic Finite Automata ) DFA ( Deterministic Finite Automata ) LEX Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code
36
The Structure of a Compiler (4)
Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Parser Given a formal syntax specification (typically as a context-free grammar [CFG] ), the parse reads tokens and groups them into units as specified by the productions of the CFG being used. As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax tree. CFG ( Context-Free Grammar ) BNF ( Backus-Naur Form ) GAA ( Grammar Analysis Algorithms ) LL, LR, SLR, LALR Parsers YACC Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code
37
The Structure of a Compiler (5)
Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Semantic Routines Perform two functions Check the static semantics of each construct Do the actual translation The heart of a compiler Syntax Directed Translation Semantic Processing Techniques IR (Intermediate Representation) Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code
38
The Structure of a Compiler (6)
Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Optimizer The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but improved IR code This phase can be very complex and slow Peephole optimization loop optimization, register allocation, code scheduling Register and Temporary Management Peephole Optimization Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator Target machine code
39
The Structure of a Compiler (7)
Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Code Generator Interpretive Code Generation Generating Code from Tree/Dag Grammar-Based Code Generator Optimizer Code Generator Target machine code
40
The Structure of a Compiler (8)
Code Generator [Intermediate Code Generator] Non-optimized Intermediate Code Scanner [Lexical Analyzer] Tokens Code Optimizer Parser [Syntax Analyzer] Optimized Intermediate Code Parse tree Code Optimizer Semantic Process [Semantic analyzer] Target machine code Abstract Syntax Tree w/ Attributes
41
LEXICAL ANALYSIS The role of the lexical analyzer
First phase of a compiler 1、Main task To read the input characters To produce a sequence of tokens used by the parser for syntax analysis As an assistant of parser
42
LEXICAL ANALYSIS The role of the lexical analyzer
2、Interaction of lexical analyzer with parser Lexical analyzer Parser Symbol table Source program token Get next token
43
LEXICAL ANALYSIS The role of the lexical analyzer
3、Processes in lexical analyzers Scanning Pre-processing Strip out comments and white space Macro functions Correlating error messages from compiler with source program A line number can be associated with an error message Lexical analysis
44
LEXICAL ANALYSIS The role of the lexical analyzer
4、Terms of the lexical analyzer Token Types of words in source program Keywords, operators, identifiers, constants, literal strings, punctuation symbols(such as commas,semicolons) Lexeme Actual words in source program Pattern A rule describing the set of lexemes that can represent a particular token in source program Relation {<.<=,>,>=,==,<>}
45
LEXICAL ANALYSIS The role of the lexical analyzer
5、Attributes for Tokens A pointer to the symbol-table entry in which the information about the token is kept E.g E=M*C**2 <id, pointer to symbol-table entry for E> <assign_op,> <id, pointer to symbol-table entry for M> <multi_op,> <id, pointer to symbol-table entry for C> <exp_op,> <num,integer value 2>
46
LEXICAL ANALYSIS The role of the lexical analyzer
6、Lexical Errors Deleting an extraneous character Inserting a missing character Replacing an incorrect character by a correct character Transposing two adjacent characters(such as , fi=>if) Pre-scanning
47
LEXICAL ANALYSIS The role of the lexical analyzer
7、Input Buffering Two-buffer input scheme to look ahead on the input and identify tokens Buffer pairs Sentinels(Guards)
48
LEXICAL ANALYSIS The role of the lexical analyzer
1、Regular Definition of Tokens Defined in regular expression e.g. Id letter(letter|digit) letter A|B|…|Z|a|b|…|z digit 0|1|2|…|9 Notes: Regular expressions are an important notation for specifying patterns. Each pattern matches a set of strings, so regular expressions will serve as as names for sets of strings.
49
LEXICAL ANALYSIS The role of the lexical analyzer
2、Regular Expression & Regular language Regular Expression A notation that allows us to define a pattern in a high level language. Regular language Each regular expression r denotes a language L(r) (the set of sentences relating to the regular expression r) Notes: Each word in a program can be expressed in a regular expression
50
LEXICAL ANALYSIS The role of the lexical analyzer
3、The rule of regular expression over alphabet is a regular expression that denote {} is regular expression {} is the related regular language 2) If a is a symbol in , then a is a regular expression that denotes {a} a is regular expression {a} is the related regular language
51
LEXICAL ANALYSIS The role of the lexical analyzer
3、The rule of regular expression over alphabet 3) Suppose and are regular expressions, then |, , * , * is also a regular expression Notes: Rules 1) and 2) form the basis of the definition; rule 3) provides the inductive step.
52
LEXICAL ANALYSIS The role of the lexical analyzer
4、Algebraic laws of regular expressions 1) |= | 2) |(|)=(|)| () =( ) 3) (| )= | (|)= | 4) = = 5)(*)*=* 6) *=+| + = * = * 7) (|)*= (* | *)*= (* *)*
53
LEXICAL ANALYSIS The role of the lexical analyzer
4、Algebraic laws of regular expressions 8) If L(),then = | = * = | = * Notes: We assume that the precedence of * is the highest, the precedence of | is the lowest and they are left associative
54
LEXICAL ANALYSIS The role of the lexical analyzer
5、Notational Short-hands a)One or more instances ( r ) digit+ b)Zero or one instance r? is a shorthand for r| (E(+|-)?digits)? c)Character classes [a-z] denotes a|b|c|…|z [A-Za-z] [A-Za-z0-9]
55
LEXICAL ANALYSIS The Specification of Tokens
1、Task of recognition of token in a lexical analyzer Isolate the lexeme for the next token in the input buffer Produce as output a pair consisting of the appropriate token and attribute-value, such as <id,pointer to table entry> , using the translation table given in the Fig in next page
56
LEXICAL ANALYSIS The Specification of Tokens
1、Task of recognition of token in a lexical analyzer Regular expression Token Attribute-value if - id Pointer to table entry < relop LT
57
LEXICAL ANALYSIS The Specification of Tokens
2、Methods to recognition of token Use Transition Diagram
58
LEXICAL ANALYSIS The Specification of Tokens
3、Transition Diagram(Stylized flowchart) Depict the actions that take place when a lexical analyzer is called by the parser to get the next token Accepting state start > = return(relop,GE) 6 7 other Start state * 8 return(relop,GT) Notes: Here we use ‘*’ to indicate states on which input retraction must take place
59
LEXICAL ANALYSIS The Specification of Tokens
4、Implementing a Transition Diagram Each state gets a segment of code If there are edges leaving a state, then its code reads a character and selects an edge to follow, if possible Use nextchar() to read next character from the input buffer
60
LEXICAL ANALYSIS The Specification of Tokens
4、Implementing a Transition Diagram while (1) { switch(state) { case 0: c=nextchar(); if (c==blank || c==tab || c==newline){ state=0;lexeme_beginning++} else if (c== ‘<‘) state=1; else if (c==‘=‘) state=5; else if(c==‘>’) state=6 else state=fail(); break case 9: c=nextchar(); if (isletter( c)) state=10; else state=fail(); break … }}}
61
LEXICAL ANALYSIS The Specification of Tokens
5、A generalized transition diagram Finite Automation Deterministic or non-deterministic FA Non-deterministic means that more than one transition out of a state may be possible on the the same input symbol
62
LEXICAL ANALYSIS The Specification of Tokens
The model of recognition of tokens i f d 2 =… FA simulator Input buffer Lexeme_beginning
63
LEXICAL ANALYSIS The Specification of Tokens
e.g:The FA simulator for Identifiers is: Which represent the rule: identifier=letter(letter|digit)* letter digit
64
INPUT BUFFERING Speedup the reading the source program
Look one or more characters beyond the next lexeme There are many situations where we need to look at least one additional character ahead.
65
INPUT BUFFERING For instance, we cannot be sure we’ve seen the end of an identifier until we see a character that is not a letter or digit, and therefore is not part of the lexeme for id. In C, single-character operators like -, =, or < could also be the beginning of a two-character operator like ->, ==, or <=. A a two-buffer scheme that handles large lookaheads safely. We then consider an improvement involving “sentinels” that saves time checking for the ends of buffers.
66
END OF UNIT - 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.