Download presentation
Presentation is loading. Please wait.
Published byKathlyn Malone Modified over 9 years ago
1
Lexical Analysis Mooly Sagiv msagiv@post.tau.ac.il Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc.html Textbook:Modern Compiler Implementation in C Chapter 2
2
A motivating example Create a program that counts the number of lines in a given input file
3
A motivating example solution int num_lines = 0; % \n ++num_lines;. ; % main() { yylex(); printf( "# of lines = %d\n", num_lines); }
4
Subjects Roles of lexical analysis The straightforward solution a manual scanner for C Regular Expressions Finite automata From regular languages into finite automata Flex
5
Basic Compiler Phases Source program (string) Fin. Assembly lexical analysis syntax analysis semantic analysis Translate Instruction selection Register Allocation Tokens Abstract syntax tree Intermediate representation Assembly Finite automata Pushdown automata Memory organization graph algorithms Dynamic programming
6
Example a\b := 5 + 3 ;\nb := (print(a, a-1), 10 * a) ;\nprint(b) Input string Tokens id (“a”) assign num (5) + num(3) ; id(“b”) assign print(id(“a”), id(“a”) - num(1)), num(10) * id(“a”)) ; print(id(“b”))
7
Functionality –input program text (file) –output sequence of tokens –Read input file –Identify language keywords and standard identifiers –Handle include files and macros –Count line numbers –Remove whitespaces –Report illegal symbols –Produce symbol table Lexical Analysis (Scanning)
8
A simplified scanner for C Token nextToken() { char c ; loop: c = getchar(); switch (c){ case ` `:goto loop ; case `;`: return SemiColumn; case `+`: c = getchar() ; switch (c) { case `+': return PlusPlus ; case '=’ return PlusEqual; default: putchar(c); return Plus; } case `<`: case `w`: }
9
Automatic Generation of Lexical Analysis The matching of input strings can be performed by a finite automaton Examples: –An automaton for while –An automaton for C identifier –An automaton for C comment The program for the automaton is automatically generated from regular expressions
10
Flex Input – regular expressions and actions (C code) Output – A scanner program that reads the input and applies actions when input regular expression is matched flex regular expressions input program tokens scanner
11
Regular Expression Notations aAn ordinary character stands for itself M|NM or N MNM followed by N M*Zero or more times of M M+One or more times of M M?Zero or one occurrence of M [a-zA-Z]Character set alternation (single character).Any (single) character but newline “a.+”Quotation \Convert an operator into text
12
Ambiguity Resolving Find the longest matching token Between two tokens with the same length use the one declared first
13
A Flex specification of C Scanner Letter [a-zA-Z_] Digit [0-9] % [ \t]{;} [\n]{line_count++;} “;”{ return SemiColumn;} “++”{ return PlusPlus ;} “+=“{ return PlusEqual ;} “+”{ return Plus} “while”{ return While ; } {Letter}({Letter}|{Digit})*{ return Id ;} “<=”{ return LessOrEqual;} “<”{ return LessThen ;}
14
Running Example if{ return IF; } [a-z][a-z0-9]*{ return ID; } [0-9]+ { return NUM; } [0-9]”.”[0-9]*|[0-9]*”.”[0-9]+{ return REAL; } (\-\-[a-z]*\n)|(“ “|\n|\t){ ; }.{ error(); }
15
int edges[][256] ={ /* …, 0, 1, 2, 3,..., -, e, f, g, h, i, j,... */ /* state 0 */ {0,..., 0, 0, …, 0, 0, 0, 0, 0,..., 0, 0, 0, 0, 0, 0} /* state 1 */ {13,..., 7, 7, 7, 7, …, 9, 4, 4, 4, 4, 2, 4,..., 13, 13} /* state 2 */ {0, …, 4, 4, 4, 4,..., 0, 4, 3, 4, 4, 4, 4,..., 0, 0} /* state 3 */ {0, …, 4, 4, 4, 4, …, 0, 4, 4, 4, 4, 4, 4,, 0, 0} /* state 4 */{0, …, 4, 4, 4, 4,..., 0, 4, 4, 4, 4, 4, 4,..., 0, 0} /* state 5 */{0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0} /* state 6 */ {0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0,..., 0, 0} /* state 7 */... /* state 13 */{0, …, 0, 0, 0, 0, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0}
16
Pseudo Code for Scanner Token nextToken() { lastFinal = 0; currentState = 1 ; inputPositionAtLastFinal = input; currentPosition = input; while (not(isDead(currentState))) { nextState = edges[currentState][currentPosition]; if (isFinal(nextState)) { lastFinal = nextState ; inputPositionAtLastFinal = currentPosition; } currentState = nextState; advance currentPosition; } input = inputPositionAtLastFinal ; return action[lastFinal]; }
17
Example Input: “if --not-a-com”
18
Efficient Scanners Efficient state representation Input buffering Using switch and goto instead of tables
19
Constructing Automaton from Specification Create a non-deterministic automaton (NDFA) from every regular expression Merge all the automata using epsilon moves (like the | construction) Construct a deterministic finite automaton (DFA) Minimize the automaton starting with separate accepting states
20
NDFA Construction if{ return IF; } [a-z][a-z0-9]*{ return ID; } [0-9]+ { return NUM; } [0-9]”.”[0-9]*|[0-9]*”.”[0-9]+{ return REAL; } (\-\-[a-z]*\n)|(“ “|\n|\t){ ; }.{ error(); }
21
DFA Construction
22
Minimization
23
%{ /* C declarations */ #include “tokens.h'' /* Mapping of tokens into integers */ #include “errormsg.h'' /* Shared by all the phases */ union {int ival; string sval; double fval;} yylval; int charPos=1 ; #define ADJ (EM_tokPos=charPos, charPos+=yyleng) %} /* Lex Definitions */ digits [0-9]+ % if { ADJ; return IF;} [a-z][a-z0-9] { ADJ; yylval.sval=String(yytext); return ID; } {digits} {ADJ; yylval.ival=atoi(yytext); return NUM; } ({digits}\.{digits}?)|({digits}?\.{digits}) { ADJ; yylval.fval=atof(yytext); return REAL; } (\-\-[a-z]*\n)|([\n\t]|" ")* { ADJ; }. { ADJ; EM_error(“illegal character''); }
24
Start States Regular expressions may be more complicated than automata –C comments Solutions –Conversion of automata into regular expressions –Start States % start s1 s2 % r1 { action0 ; BEGIN s_1; } r1 { action1 ; BEGIN s2; } r2 { action2 ; BEGIN INITIAL};
25
Realistic Example % start Comment % ”/*'' { BEGIN Comment; } r1 { Usual actions; } r2 { Usual actions; }... rk { Usual actions; } ”*/”’ { BEGIN Initial; }.|\n ;
26
Summary For most programming languages lexical analyzers can be easily constructed Exceptions: –Fortran –PL/1 Flex is a useful tool beyond compilers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.