Download presentation
Presentation is loading. Please wait.
1
Chapter 3 Lexical Analysis
2
Content Overview of this chapter 3.1 The Role of the Lexical Analyzer
3.2 Input Buffering 3.3 Specification of Tokens 3.4 Recognition of Tokens 3.5 The Lexical- Analyzer Generator Lex 3.6 Finite Automata 3.7 From Regular Expressions to Automata 3.8 Design of a Lexical- Analyzer Generator
3
How to construct a lexical analyzer?
Overview How to construct a lexical analyzer? By hand: 1)Describe the lexemes of each token 2)Identify each occurrence of each lexeme 3)Return information Automatically: 1) Lexical-analyzer generator 2) Lexical analyzer
4
3.1 The Role of the Lexical Analyzer
Main task of lexical analyzer: Read the input characters Group them into lexemes Output a sequence of tokens Others − Stripping out comments and whitespace − Correlating error messages
5
3.1 The Role of the Lexical Analyzer
Interact with: Parser: for syntax analysis Symbol table
6
3.1.1 Lexical Analysis Versus Parsing
Why analysis portion is separated into lexical analysis and parsing (syntax analysis)? Design simplicity Compiler efficiency Compiler portability
7
Token: Pattern: Lexeme: 3.1.2 Tokens, Patterns, and Lexemes
A pair consisting of a token name and an optional attribute value Pattern: A description of the form that the lexemes of a token may take Lexeme: A sequence of characters in the source program that matches the pattern
8
3.1.2 Tokens, Patterns, and Lexemes
Examples of tokens:
9
Describes the lexeme represented by the token Example: E = M*C**2
3.1.3 Attributes for Tokens Describes the lexeme represented by the token Example: E = M*C**2 <id, pointer to symbol-table entry for E> < assign-op > <id, pointer to symbol-table entry for M> <mult -op> <id, pointer to symbol-table entry for C> <exp-op> <number , integer value 2 >
10
3.1.4 Lexical Errors Cannot tell: return token to the parser
e.g. fi(a==f(x))… None of the patterns matches any prefix of the remaining input “panic mode” recovery Other error-recovery actions: 1. Delete one character 2. Insert a missing character 3. Replace a character 4. Transpose two adjacent characters
11
Why we need input buffers?
3.2 Input Buffering Why we need input buffers? We often have to look one or more characters ahead Speeding reading program In this section, we Introduce a two-buffer scheme Consider an improvement involving “sentinels”
12
3.2.1 Buffer Pairs lexemeBegin: marks the beginning of the current lexeme forward: scans ahead until a pattern match is found
13
3.2.2 Sentinels A special character at the buffer end
14
3.3.1 Strings and Languages Some Concepts: symbol: letters, digits, and punctuation alphabet: any finite set of symbols e.g. {0,1}, ASCII, Unicode string: a finite sequence of symbols |s|: length of a string s ∊: empty string language: any countable set of strings e.g. Ф, {∊}, C programs, English sentences
15
3.3.1 Strings and Languages Operations on strings: concatenation: xy e.g. 1) x = dog ,y = house ,xy = doghouse. 2) ∊s=s∊=s exponentiation:
16
3.3.2 Operations on Languages
union: L U M = {s |s is in L or s is in M} concatenation: LM = {st |s is in L and t is in M} closure: Kleene closure: Positive closure:
17
3.3.3 Regular Expressions Describing languages
e.g. C identifiers: letter_(letter_|digit)* notice: a) The regular expressions are built recursively out of smaller regular expressions b) Each regular expression r denotes a language L(r) BASIS: (two rules) 1. ∊ is a regular expression, and L(∊) is {∊} 2. If a is a symbol in ∑ ,then a is a regular expression, and L(a) = {a}
18
3.3.3 Regular Expressions INDUCTION:
1. (r)|(s) is a regular expression denoting the language L(r) U L(s) 2. (r)(s) is a regular expression denoting the language L(r)L(s) 3. (r)* is a regular expression denoting (L(r))* 4. (r) is a regular expression denoting L(r) Some conventions: 1. * has highest precedence and is left associative 2. Concatenation has second highest precedence and is left associative
19
3. | has lowest precedence and is left associative regular set:
3.3.3 Regular Expressions 3. | has lowest precedence and is left associative e.g. (a)|((b)*(c)) = a|b*c regular set: A language that can be defined by a regular expression equivalent Two regular expressions r and s denote the same regular set, write r=s
20
3.3.3 Regular Expressions Algebraic laws for regular expressions
21
A sequence of definitions of the form: d1->r1 d2->r2 … dn->rn
3.3.4 Regular Definitions Regular Definition A sequence of definitions of the form: d1->r1 d2->r2 … dn->rn where: 1. Each di is a new symbol 2. Each ri is a regular expression
22
3.3.4 Regular Definitions Example: C identifiers letter_ -> A|B|…|Z|a|b| …|z|_ digit->0|1|…|9 id ->letter_( letter_ | digit)*
23
3.3.5 Extensions of Regular Expressions One or more instances: +
1. (r)+denotes the language (L(r))+ 2. r* = r+|Є 3. r+ = rr* = r*r Zero or one instance: ? 1. r? =rlЄ 2. L(r?) =L(r) U {Є} Character classes: 1. a1 la2 l. .. |an=[ala an]. 2. a|b|. . . |z=[a-z]
24
3.4 Recognition of Tokens How to recognize tokens? Reserved words: if, else, then… Id: letter Number: digit Relop: <, >, =, <=, >=, <>… Ws: blank, tab, newline…
25
3.4.1 Transition Diagrams States: represents a condition Edges: directed from one state to another Some Conventions: 1. Accepting or final states 2. *: retract the forward pointer one position 3. Start or initial state
26
3.4.1 Transition Diagrams Example: A transition diagram that recognizes the lexemes matching the token relop
27
3.4.2 Recognition of Reserved Words and Identifiers
Two ways to handle reserved words: Install the reserved words in the symbol table initially Create separate transition diagrams for each keyword
28
3.4.3 Completion of the Running Example
Transition diagram for token number Transition diagram for whitespace
29
3.4.4 Architecture of a Transition-Diagram-Based Lexical Analyzer
A sketch of getRelop() to simulate the transition diagram for relop
30
3.4.4 Architecture of a Transition-Diagram-Based Lexical Analyzer
Ways code fit into the entire lexical analyzer 1. Arrange for the transition diagrams for each token to be tried sequentially 2. Run the various transition diagrams "in parallel" 3. Combine all the transition diagrams into one (preferred)
31
The end of Lecture02
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.