Other Issues - § 3.9 – Not Discussed More advanced algorithm construction – regular expression to DFA directly
Final Notes : R.E. to NFA Construction So, an NFA may be simulated by algorithm, when NFA is constructed using Previous techniques Algorithm run time is proportional to |N| * |x| where |N| is the number of states and |x| is the length of input Alternatively, we can construct DFA from NFA and use the resulting Dtran to recognize input: space required O(|r|)O(|r|*|x|) O(|x|)O(2 |r| )DFA NFA time to simulate where |r| is the length of the regular expression.
Pulling Together Concepts Designing Lexical Analyzer Generator Reg. Expr. NFA construction NFA DFA conversion DFA simulation for lexical analyzer Recall Lex Structure Pattern Action … … - Each pattern recognizes lexemes - Each pattern described by regular expression e.g. etc. (abc)*ab (a | b)*abb Recognizer!
Lex Specification Lexical Analyzer Let P 1, P 2, …, P n be Lex patterns (regular expressions for valid tokens in prog. lang.) Construct N(P 1 ), N(P 2 ), … N(P n ) Note: accepting state of N(P i ) will be marked by P i Construct NFA: N(P 1 ) N(P 2 ) N(P n ) Lex applies conversion algorithm to construct DFA that is equivalent!
Pictorially Lex Specification Lex Compiler Transition Table (a) Lex Compiler FA Simulator Transition Table lexeme input buffer (b) Schematic lexical analyzer
Example P 1 : a P 2 : abb P 3 : a*b + 3 patterns NFA’s : start 1 b b bb a a a P1P1 P2P2 P3P3
Example – continued (2) Combined NFA : 0 b b bb a a a start Examples a a b a {0,1,3,7} {2,4,7} {7} {8} death pattern matched: - P 1 - P 3 - a b b {0,1,3,7} {2,4,7} {5,8} {6,8} pattern matched: - P 1 P 3 P 2,P 3 P1P1 P2P2 P3P3 break tie in favor of P 2
Example – continued (3) Alternatively Construct DFA: (keep track of correspondence between patterns and new accepting states) P2P2 {8}-{6,8} P3P3 -{5,8} none{8}{7} P3P3 {8}- P1P1 {5,8}{7}{2,4,7} none{8}{2,4,7}{0,1,3,7} PatternbaSTATE Input Symbol break tie in favor of P 2
Minimizing the Number of States of DFA 1.Construct initial partition of S with two groups: accepting/ non-accepting. 2.(Construct new )For each group G of do begin 1.Partition G into subgroups such that two states s,t of G are in the same subgroup iff for all symbols a states s,t have transitions on a to states of the same group of . 2.Replace G in new by the set of all these subgroups. 3.Compare new and . If equal, final := then proceed to 4, else set := new and goto 2. 4.Aggregate states belonging in the groups of final
example D C A B b b a a a b b F a b A,C,D B,F a b b a a Minimized DFA:
Using LEX Lex Program Structure: declarations % translation rules % auxiliary procedures Name the file e.g. test.lex Then, “ lex test.lex ” produces the file “ lex.yy.c ” (a C-program)
LEX %{ /* definitions of all constants LT, LE, EQ, NE, GT, GE, IF, THEN, ELSE,... */ %} letter[A-Za-z] digit[0-9] id{letter}({letter}|{digit})* % if{ return(IF);} then{ return(THEN);} {id}{ yylval = install_id(); return(ID); } % install_id() {/* procedure to install the lexeme to the ST */ C declarations declarations Rules Auxiliary
Example of a Lex Program int num_lines = 0, num_chars = 0; % \n {++num_lines; ++num_chars;}. {++num_chars;} % main( argc, argv ) int argc; char **argv; { ++argv, --argc; /* skip over program name */ if ( argc > 0 ) yyin = fopen( argv[0], "r" ); else yyin = stdin; yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars ); }
Another Example %{ #include %} WS[ \t\n]* % [ ]+ printf("NUMBER\n"); [a-zA-Z][a-zA-Z0-9]* printf("WORD\n"); {WS} /* do nothing */. printf(“UNKNOWN\n“); % main( argc, argv ) int argc; char **argv; { ++argv, --argc; if ( argc > 0 ) yyin = fopen( argv[0], "r" ); else yyin = stdin; yylex(); }
Concluding Remarks Focused on Lexical Analysis Process, Including - Regular Expressions - Finite Automaton - Conversion - Lex - Interplay among all these various aspects of lexical analysis Looking Ahead: The next step in the compilation process is Parsing: - Top-down vs. Bottom-up -- Relationship to Language Theory