Download presentation
Presentation is loading. Please wait.
Published byMagdalen Hubbard Modified over 9 years ago
1
c Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, TAIWAN
2
c Chuen-Liang Chen, NTUCS&IE / 36 Scanner (lexical analyzer) primary function -- grouping input characters into tokens called by -- parser return --1. token code 2. attribute (optional) theoretical bases -- regular expression, finite automata implementation dedicated program (hardwired) table-driven construction hand-coded by generator, in order to limit the effort in building a scanner by specifying which tokens the scanner is to recognize –program [lex] –table + standard driver program [ScanGen]
3
c Chuen-Liang Chen, NTUCS&IE / 37 Regular expression (1/2) being used to specify simple set of strings (regular set) specify tokens of programming language program a scanner generator string -- catenation of characters in vocabulary, denoted V regular expression meta-characters: ( ) ‘ * + ? | –have to be quoted when used as ordinary characters 1. -- empty set 2. -- set of null string 3.s-- { string s } 4.A | B-- alternation of corresponding regular sets 5.A B-- catenation of corresponding regular sets 6.A*-- Kleene closure of corresponding regular set –repeating zero or more times
4
c Chuen-Liang Chen, NTUCS&IE / 38 Regular expression (2/2) other notations A + = A A* A ? = A | Not(A) = V - A for set of characters A Not(S) = V* - S for set of stings S –may be infinite but still regular A k = A A... A (k times) examples -- anything EolComment = - - ( Not(Eol) )* Eol fixed decimal literalLit = D +. D + identifierbegin with letterID = L ( L | D )* ( _ ( L | D ) + )* end with letter/digit without consecutive underlines being able to represent all finite sets and many but not all infinite sets QUIZ: counter example? QUIZ: counter example?
5
c Chuen-Liang Chen, NTUCS&IE / 39 being used to recognize the tokens specified by a regular expression consisting of a finite set of states a set of transitions labeled with characters in V a start state a set of final states transition diagram transition table blank: error entry deterministic finite automata (DFA) unique transition for a given state and character otherwise, nondeterministic finite automata (NFA) Finite automata 1234 -- Not(Eol) Eol
6
c Chuen-Liang Chen, NTUCS&IE / 40 rules Kleene closure vocabulary catenation alternation NFA for A A NFA for B NFA for A A NFA for A A NFA for B B a From RE to NFA
7
c Chuen-Liang Chen, NTUCS&IE / 41 From NFA to DFA major operation: -closure example 1 34 25 a aa b a | b 1,24,5 3, 4,5 5 ab a a | b 1,2 3, 4,5 a 1,24,5 3, 4,5 5 ab a a | b 1,24,5 3, 4,5 5 ab a 1. -closure(1) = 1, 2 2. -closure( 3, 4, 5 ) = 3, 4, 5 3. -closure( 4, 5 ) = 5 4. -closure( 5 ) = 5
8
c Chuen-Liang Chen, NTUCS&IE / 42 major operation: partition states into equivalent classes according to final / non-final states transition functions example DFA optimization ( A B C D E ) ( A B C D ) ( E ) ( A B C ) ( D ) ( E ) ( A C ) ( B ) ( D ) ( E )
9
c Chuen-Liang Chen, NTUCS&IE / 43 From DFA to scanner (1/3) dedicated program example if (current_char == '-') { current_char = getchar(); if (current_char == '-') { do current_char = getchar(); while (current_char != '\n'); } else { ungetc(current_char, stdin); lexical_error(current_char); } else lexical_error(current_char); /* Return or process valid token. */ ungetc() -- lookahead 1234 -- Not(Eol) Eol
10
c Chuen-Liang Chen, NTUCS&IE / 44 table-driven transition table + return token code + character save/toss operation + process of valid token example From DFA to scanner (2/3) /* * Note: current_char is already set * to the current input character. */ state = initial_state; while (TRUE) { next_state = T[state][current_char]; if (next_state == ERROR) break; state = next_state; if (current_char == EOF) break; current_char = getchar(); } if (is_final_state(state)) /* Return or process valid token. */ else lexical_error(current_char); QUIZ: where is “lookahead” ? QUIZ: where is “lookahead” ?
11
c Chuen-Liang Chen, NTUCS&IE / 45 From DFA to scanner (3/3) toss operation example -- ( " ( Not(") | " " )* " ) QUIZ: how to program? QUIZ: how to program? " " " H i " " " " H i " T( " ) " NOT( " )
12
c Chuen-Liang Chen, NTUCS&IE / 46 Reserved words identifiers reserved for particular usage approach 1 one reserved word one regular expression approach 2 exceptions to ordinary identifiers approach used in our simple example QUIZ: comparison?
13
c Chuen-Liang Chen, NTUCS&IE / 47 Lexical error recovery strategies delete the characters read so far delete the first character handling of runaway string QUIZ: why need special handling? QUIZ: why need special handling? " ( Not("|Eol) | " " )* " " ( Not("|Eol) | " " )* Eol –print out special error message handling of runaway comment { Not({|})* } { ( Not({|})* { Not({|})* )+ } –warning { Not(})* Eof –error
14
c Chuen-Liang Chen, NTUCS&IE / 48 Lex (1/2) input file -- E[Ee] OtherLetter[A-DF-Za-df-z] Digit[0-9] Letter{E} | {OtherLetter} IntLit{Digit}+ % [ \t\n]+{ /* delete */} [Bb][Ee][Gg][Ii][Nn]{ minor=0; return(4);} [Ee][Nn][Dd]{ minor=0; return(5);} [Rr][Ee][Aa][Dd]{ minor=0; return(6);} [Ww][Rr][Ii][Tt][Ee]{ minor=0; return(7};} {Letter}({Letter} | {Digit} | _)*{ minor=0; return(1);} {IntLit}{ minor=1; return(2};} ({IntLit}[.]{IntLit})({E}[+-]?{IntLit})?{ minor=2; return(2};} \"([^\"\n] I \"\")*\"{ stripquotes(); minor=3; return(2);} \"([^\"\n] I \"\"}*\n{ stripquotes(); minor=0; return(3);} "("{ minor=0; return(8};} ")"{ minor=0; return(9);} ";"{ minor=0; return(10);} ","{ minor=0; return(11);} ":="{ minor=0; return(12);} "+"{ minor=0; return(13};} " "{ minor=0; return(14};} % executed when RE is matched precedence regular expression class to reduce table size
15
c Chuen-Liang Chen, NTUCS&IE / 49 Lex (2/2) input file -- /* Strip unwanted quotes from string in yytext; adjust yyleng. */ void stripquotes(void} { int frompos, topos = 0, numquotes = 2; for (frompos = 1; frompos < yyleng; frompos++) { yytext[topos++] = yytext[frompos]; if (yytext[frompos] == '"' && yytext[frompos+1] == '"') { frompos++; numquotes++; } yyleng -= numquotes; yytext[yyleng] = '\0'; } output -- a program interface --int yylex( ) char yytext; int yyleng; auxiliary routine(s)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.