Chapter 3 Lexical Analysis.

Chapter 3 Lexical Analysis

Content Overview of this chapter 3.1 The Role of the Lexical Analyzer
3.2 Input Buffering 3.3 Specification of Tokens 3.4 Recognition of Tokens 3.5 The Lexical- Analyzer Generator Lex 3.6 Finite Automata 3.7 From Regular Expressions to Automata 3.8 Design of a Lexical- Analyzer Generator

How to construct a lexical analyzer？
Overview How to construct a lexical analyzer？ By hand: 1)Describe the lexemes of each token 2)Identify each occurrence of each lexeme 3)Return information Automatically: 1) Lexical-analyzer generator 2) Lexical analyzer

3.1 The Role of the Lexical Analyzer
Main task of lexical analyzer: Read the input characters Group them into lexemes Output a sequence of tokens Others − Stripping out comments and whitespace − Correlating error messages

3.1 The Role of the Lexical Analyzer
Interact with: Parser: for syntax analysis Symbol table

3.1.1 Lexical Analysis Versus Parsing
Why analysis portion is separated into lexical analysis and parsing (syntax analysis)? Design simplicity Compiler efficiency Compiler portability

Token: Pattern: Lexeme: 3.1.2 Tokens, Patterns, and Lexemes
A pair consisting of a token name and an optional attribute value Pattern: A description of the form that the lexemes of a token may take Lexeme: A sequence of characters in the source program that matches the pattern

3.1.2 Tokens, Patterns, and Lexemes
Examples of tokens:

Describes the lexeme represented by the token Example: E = M*C**2
3.1.3 Attributes for Tokens Describes the lexeme represented by the token Example: E = M*C**2 <id, pointer to symbol-table entry for E> < assign-op > <id, pointer to symbol-table entry for M> <mult -op> <id, pointer to symbol-table entry for C> <exp-op> <number , integer value 2 >

3.1.4 Lexical Errors Cannot tell: return token to the parser
e.g. fi(a==f(x))… None of the patterns matches any prefix of the remaining input “panic mode” recovery Other error-recovery actions: 1. Delete one character 2. Insert a missing character 3. Replace a character 4. Transpose two adjacent characters

Why we need input buffers?
3.2 Input Buffering Why we need input buffers? We often have to look one or more characters ahead Speeding reading program In this section, we Introduce a two-buffer scheme Consider an improvement involving “sentinels”

3.2.1 Buffer Pairs lexemeBegin: marks the beginning of the current lexeme forward: scans ahead until a pattern match is found

3.2.2 Sentinels A special character at the buffer end

3.3.1 Strings and Languages Some Concepts: symbol: letters, digits, and punctuation alphabet: any finite set of symbols e.g. {0,1}, ASCII, Unicode string: a finite sequence of symbols |s|: length of a string s ∊: empty string language: any countable set of strings e.g. Ф, {∊}, C programs, English sentences

3.3.1 Strings and Languages Operations on strings: concatenation: xy e.g. 1) x = dog ,y = house ,xy = doghouse. 2) ∊s=s∊=s exponentiation:

3.3.2 Operations on Languages
union: L U M = {s |s is in L or s is in M} concatenation: LM = {st |s is in L and t is in M} closure: Kleene closure: Positive closure:

3.3.3 Regular Expressions Describing languages
e.g. C identifiers: letter_(letter_|digit)* notice: a) The regular expressions are built recursively out of smaller regular expressions b) Each regular expression r denotes a language L(r) BASIS: (two rules) 1. ∊ is a regular expression, and L(∊) is {∊} 2. If a is a symbol in ∑ ,then a is a regular expression, and L(a) = {a}

3.3.3 Regular Expressions INDUCTION:
1. (r)|(s) is a regular expression denoting the language L(r) U L(s) 2. (r)(s) is a regular expression denoting the language L(r)L(s) 3. (r)* is a regular expression denoting (L(r))* 4. (r) is a regular expression denoting L(r) Some conventions: 1. * has highest precedence and is left associative 2. Concatenation has second highest precedence and is left associative

3. | has lowest precedence and is left associative regular set:
3.3.3 Regular Expressions 3. | has lowest precedence and is left associative e.g. (a)|((b)*(c)) = a|b*c regular set: A language that can be defined by a regular expression equivalent Two regular expressions r and s denote the same regular set, write r=s

3.3.3 Regular Expressions Algebraic laws for regular expressions

A sequence of definitions of the form: d1->r1 d2->r2 … dn->rn
3.3.4 Regular Definitions Regular Definition A sequence of definitions of the form: d1->r1 d2->r2 … dn->rn where: 1. Each di is a new symbol 2. Each ri is a regular expression

3.3.4 Regular Definitions Example: C identifiers letter_ -> A|B|…|Z|a|b| …|z|_ digit->0|1|…|9 id ->letter_( letter_ | digit)*

3.3.5 Extensions of Regular Expressions One or more instances: +
1. (r)+denotes the language (L(r))+ 2. r* = r+|Є 3. r+ = rr* = r*r Zero or one instance: ? 1. r? =rlЄ 2. L(r?) =L(r) U {Є} Character classes: 1. a1 la2 l. .. |an=[ala an]. 2. a|b|. . . |z=[a-z]

3.4 Recognition of Tokens How to recognize tokens? Reserved words: if, else, then… Id: letter Number: digit Relop: <, >, =, <=, >=, <>… Ws: blank, tab, newline…

3.4.1 Transition Diagrams States: represents a condition Edges: directed from one state to another Some Conventions: 1. Accepting or final states 2. *: retract the forward pointer one position 3. Start or initial state

3.4.1 Transition Diagrams Example: A transition diagram that recognizes the lexemes matching the token relop

3.4.2 Recognition of Reserved Words and Identifiers
Two ways to handle reserved words: Install the reserved words in the symbol table initially Create separate transition diagrams for each keyword

3.4.3 Completion of the Running Example
Transition diagram for token number Transition diagram for whitespace

3.4.4 Architecture of a Transition-Diagram-Based Lexical Analyzer
A sketch of getRelop() to simulate the transition diagram for relop

3.4.4 Architecture of a Transition-Diagram-Based Lexical Analyzer
Ways code fit into the entire lexical analyzer 1. Arrange for the transition diagrams for each token to be tried sequentially 2. Run the various transition diagrams "in parallel" 3. Combine all the transition diagrams into one (preferred)

The end of Lecture02

Chapter 3 Lexical Analysis.

Similar presentations

Presentation on theme: "Chapter 3 Lexical Analysis."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 3 Lexical Analysis.

Similar presentations

Presentation on theme: "Chapter 3 Lexical Analysis."— Presentation transcript:

Similar presentations

About project

Feedback