Download presentation
1
CPSC Compiler Tutorial 2 Scanner & Lex
2
Tokens Input Token Stream: Each significant lexical chunk of the program is represented by a token Operators & Punctuation: { } ! + - = * ; : … Keywords: if while return goto Identifier: id & actual name Constants: kind & value; int, floating-point character, string, …
3
Token – example 1 Input text if( x >= y ) y = 10; Token Stream IF
LP ID(x) GEQ ID(y) RP ID(y) Assign INT(10) SEMI
4
Parser Tokens IF LP ID(x) GEQ ID(y) RP ID(y) Assign INT(10) SEMI
IfStmt >= assign ID(y) ID(y) INT(10) ID(x)
5
Sample Grammar Program ::= statement | program statement
Statement ::= assignStmt | ifStmt assignStmt ::= id = expr; ifStmt ::= if ( expr ) Statement Expr ::= id | int | expr + expr id ::= a | b | … | y | z Int ::= 1 | 2 | … | 9 | 0 a, b, 1, 2, 0 – terminal symbols; program, statement, id: non-terminal symbols.
6
Why Separate the Scanner and Parser?
Simplicity & Separation of Concerns Scanner hides details from parser (comments, whitespace, input files, etc.) Parser is easier to build; has simpler input stream Efficiency Scanner can use simpler, faster design (But still often consumes a surprising amount of the compiler’s total execution time)
7
Principle of Longest Match
In most of languages, the scanner should pick the longest possible string to make up the next token if there is a choice. Example return apple != banana; Should be recognized as 5 tokens Not more (not parts of words or identifier, or ! And = as separate tokens) return ID(apple) NEQ ID(banana) SEMI
8
Scanner DFA Example (1) White space or comments Accept EOF 1
Accept EOF 1 end of input ( Accept LP 2 ) 3 Accept RP ; 4 Accept SEMI
9
Scanner DFA Example (2) White space or comments Accept NEQ ! = 6 5
Accept NOT other 7 8 < = 9 Accept LEQ other 10 Accept LESS
10
Scanner DFA Example (3) White space or comments [0-9] [0-9] 11
Accept INT other 12
11
Scanner DFA Example (4) White space or comments [a-zA-Z] [a-zA-Z] 13
Accept ID or keyword other 14
12
Lex/Flex Use Flex instead of Lex Use Bison instead of yacc
When compile, link to the library flex file.lex gcc –o object lex.yy.c –ll object
13
Lex - Structure Declarations/Definitions %% Rules/Production
- Lex expression - white space - C statement (optional) Additional Code/Subroutines
14
Lex – Basic operators * - zero or more occurrences . - “ANY” character
.* - matches any sequence | - separator + - one or more occurrences. (a+ :== aa*) ? - zero or one of something. (b? :== (b+null) [ ] - choice, so [12345] (1|2|3|4|5) (Note: [*+] represent a choice between star and plus. They lost their specialty. [a-zA-Z] a to z and A to Z, all the letters. \ - \* matches *, and \. Match period or decimal point.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.