Download presentation
Presentation is loading. Please wait.
Published byEleanor Anthony Modified over 9 years ago
1
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi
2
Compiler Construction Lecture 2
3
3 Compilation Process Source Code Compilation Process Object Code Error Messages Something we can understand easily Something that computer can understand easily
4
Analysis Phases of a Compiler (Structure of Compiler) Synthesis ( front end of compiler) ( back end of compiler)
5
5 Source Code Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Object Code Symbol Table Manager Error Handler Synthesis Analysis Synthesis Tokens Syntax Tree Intermediate Representation Intermediate Representation
6
6 Analysis part breaks up the source program into constituent pieces and checks grammar and syntax. It then uses this structure to generate intermediate representation of the source program. If the source program is detected syntactically incorrect or semantically unsound then proper error messages are generated so that the user may take proper action. Symbol Table is a data structure that collects information about the source program and pass it to the Synthesis part along with the intermediate representation. Synthesis part constructs the desired target program from the intermediate representation and symbol table information.
7
7 Example: Z = X + 10; TokenToken_ID Z1 =2 X1 +2 103 Symbol table 1Variable 2Operator 3Number
8
Lexical Analyzer (Scanner)
9
9 It reads a stream of characters and groups the characters into tokens Learn by Example Position = initial + rate*60 Tokens Generated 1. Identifier#1 Position 2. Assignment Operator = 3. Identifier#2 initial 4. Addition Operator + 5. Identifier#3 rate 6. Multiplication Operator * 7. Number 60 Learn by doing Percentage = Marks_Obtained / Total * 100
10
10 Source Code Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Object Code Symbol Table Manager Error Handler id1 = id2 + id3*number
11
Token The activity of breaking stream of characters into tokens s called lexical analysis. The lexical analyzer partition input string into substrings, called words, and classifies them according to their role. Example: Consider if(b == 0) a = b In the above programming sentence the words are “if”, “(”, “b”, “==”, “0”, “)”, “a”, “=” and “b”. The roles are keyword, variable, boolean operator, assignment operator. 11
12
The pairs made by lexical analyzer are: The pair is called token. 12
13
Specification/description of tokens Regular Languages are the most popular for specifying tokens. Regular languages can be described using regular expressions. Each regular expression is a notation for a regular language (a set of words). If A is a regular expression, we write L(A) to refer to language denoted by A. A regular expression (RE) is defined inductively a ordinary character from ∑ є the empty string R|S either R or S RS R followed by S (concatenation) R* concatenation of R zero or more times (R* = є |R|RR|RRR...) 13
14
Here are some REs and the strings of the language denoted by the RE. RE Strings in L(R) a “a” ab “ab” a|b “a” “b” (ab)* “” “ab” “abab”... (a| є)b “ab” “b” 14
15
15 Role of Lexical Analyzer 1. Removal of white space 2. Removal of comments 3. Recognizes constants 4. Recognizes Keywords 5. Recognizes identifiers 6. Correlates error messages with the source program
16
16 1.Removal of white space By white space we mean Blanks Tabs New lines Why ? White space is generally used for formatting source code. A = B + C Equals
17
17 Learn by Example // This is beginning of my code int A; int B = 2; int C = 33; A = B + C ; /* This is end of my code */ 1.Removal of white space
18
18 2.Removal of comments Why ? Comments are user-added strings which do not contribute to the source code Example in Java // This is beginning of my code int A; int B = 2; int C = 33; A = B + C ; /* This is end of my code */ Means nothing to the program
19
19 3.Recognizes constants/numbers How is recognition done? If the source code contains a stream of digits coming together, it shall be recognized as a constant. Example in Java // This is beginning of my code int A; int B = 2 ; int C = 33 ; A = B + C ; /* This is end of my code */
20
20 4.Recognizes keywords Keywords in C and Java If, else, for, while, do, return etc How is recognition done? By comparing the combination of letters keywords pre defined in the grammar of the programming language Example in Java int A; int B = 2 ; int C = 33 ; If ( B < C ) A = B + C ; else A = C - B Considered a keyword if character sequence 1.I 2.N 3.T Considered a keyword if character sequence 1.I 2. F Considered a keyword if character sequence 1.E 2. L3.S4.E
21
21 5.Recognizes identifiers What are identifiers ? Names of variables, functions, arrays, etc How is recognition done? If the combination of letters with/without digits in source code is not a keyword, then compiler considers it as an identifier. Where is identifier stored ? When an identifier is detected, it is entered into the symbol table Example in Java // This is beginning of my code int A; int B2 = 2 ; int C4R = 33 ; A = B + C ; /* This is end of my code */
22
22 6.Correlates error messages with the source program How ? Keeps track of the number of new line characters seen in the source code Tells the line number when an error message is to be generated. Example in Java 1. This is beginning of my code 2. int A; 3. int B2 = 2 ; 4. int C4R = 33 ; 5. A = B + C ; 6. /* This is 7. end of 8. my code 9. */ Error Message at line 1 No // inserted in the beginning
23
23 Errors generated by Lexical Analyzer 1. Illegal symbols E.g., => 2. Illegal identifiers E.g., 2ab 3. Un terminated comments E.g., /* This is beginning of my code
24
24 Learn by example // Beginning of Code int a char } switch b[2] =; // end of code No error generated Why ? It is the job of syntax analyzer
25
25 Terminologies Lexeme – Actual sequence of characters that matches a pattern and has a given Token class. – Examples: Identifier: Name, Data, x Integer: 345, 2, 0, 629 Pattern – The rules that characterize the set of strings for a token – Example: Integer: A digit followed or not followed by digits Identifier: A character followed or not followed by characters or digits
26
26
27
Syntax Analyzer (Parser)
28
28 Syntax Analyzer (Parser) Uses the tokens produced by the lexical analyzer to create a tree-like intermediate representation. Parse tree depicts the grammatical structure of the token stream. Example Source Code --> Position = initial + rate*60 Lexical Analyzer --> id1= id2+ id3 * number Parse Tree / Syntax Tree = id1 id2 + id3 * number
29
29 = id1 + id2 Id3 * 60 Syntax Analyzer (Parser)
30
30 number = id1 + id2 * id3 position initial rate 60 Syntax Analyzer (Parser)
31
31 Learn by doing Percentage = Marks_Obtained / Total * 100 Syntax Analyzer (Parser)
32
32 Source Code Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Object Code Symbol Table Manager Error Handler numbe r = id1 + id2 * id3 position initial rate 60
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.