Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.

Similar presentations


Presentation on theme: "COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University."— Presentation transcript:

1 COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University pursuant to Part VB of the Copyright Act 1968 (the Act). The material in this communication may be subject to copyright under the Act. Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act. Do not remove this notice.

2 Lexical Analysis CSE2303 Formal Methods I Lecture 6

3 Overview Lexical Analyzer Implementing Finite Automata Introduction to flex

4 Simple Calculator Example Tokens are: –Numbers, operators, spaces, and newlines -2.45 + 3. 98 * 0.456\n -2.45+3.98*0.456\n

5 C Example Tokens are: –variables, keywords, constants, operators, etc. m main(){printf( ain(){printf(“H l Hello\n“ ” ); \n lo\n ” );} e }

6 Lexical Analyzer Reads the input one character at a time. Splits the input up into tokens. Implemented using a Finite Automaton or NFA.

7 Matching a Regular Expression Write a C function yylex which reads in a character string, consisting of a ’s and b ’s, one character at a time and identifies whether or not the string matches the following regular expression. (a + bb + baa*b)*(baa*)

8 Matching a Regular Expression (clarification) –If no characters can be read then yylex should return 0. –If the string matches (a + bb + baa*b)*(baa*) then return yylex should return 1. –Otherwise yylex should return 2.

9 0 - 1 2 + b b a a a b

10 #define END 0 #define MATCH 1 #define OTHER 2 int yylex() { int currentState = 0; int table[3][2] = {{0, 1}, {2, 0}, {2, 0}}; int c = getchar(); if (c == EOF) return END; while (c != EOF) { if (c == ‘a’) currentState = table[currentState][0]; if (c == ‘b’) currentState = table[currentState][1]; c = getchar(); } if (currentState == 2) return MATCH; return OTHER; }

11 Word identification Write a C function yylex which reads in a character string one character at a time and identifies the following tokens: –newline, –space, and –word.

12 Word identification (clarification) –If no characters can be read then yylex should return 0. –If yylex reads a newline it should return 1. –If yylex reads a space (blank or tab) then it should return 2. –Otherwise, after yylex has read a word it should return 3.

13 - + + space non-white + space, newline + newline

14 enum {END = 0, NEWLINE = 1, SPACE = 2, WORD = 3}; int yylex() { int c = getchar(); if (c == EOF) return END; if (c == ‘\n‘) return NEWLINE; if (c == ‘ ‘ || c == ‘\t‘) return SPACE; while (c != EOF) { if (c == ‘ ‘ || c == ‘\t‘ || c == ‘\n‘) { ungetc(c, stdin); return WORD; } c = getchar(); } return WORD; }

15 flex Lexical analyzer generator –It writes a lexical analyzer Assumption –each token matches a regular expression Needs –set of regular expressions –for each expression an action Produces –A C program

16 Kleene’s Theorem Kleene’s Theorem is the basis of flex The input for flex –A set of patterns and corresponding actions Patterns –are regular expressions –by Kleene’s Theorem these can be represented as FA. FA –are easily to implement

17 Process Write a flex program example.l Run it through flex flex example.l The file lex.yy.c contains the function yylex Compile your C programs with the flag -lfl example.l flex lex.yy.c

18 A flex Program … definitions … % … rules … % … subroutines …

19 Sections … definition section –Code between %{ … %} copied. –Definitions used to define long expressions … rule section –Each rule has a pattern and an action. –The patterns are regular expressions … subroutine section –Consists of users subroutines. –Copied after the end of the flex generated code

20 %{ enum {MATCH = 1, OTHER = 2}; %} R (a|bb|baa*b)*(baa*) % {R} {return MATCH;} [ab]+ {return OTHER;} % (a + bb + baa*b)*(baa*) (a + bb + baa*b)*(baa*)

21 int main() { int val; … val = yylex(); … } match.c flex match.l gcc –o match match.c lex.yy.c -lfl match.l %{ enum {MATCH = 1, OTHER = 2}; %} R (a|bb|baa*b)*(baa*) % {R} {return MATCH;} [ab]+ {return OTHER;} %

22 %{ enum {NEWLINE = 1, SPACE = 2, WORD = 3}; %} % \n {return NEWLINE;} [ \t] {return SPACE;} [^ \t\n]+ {return WORD;} % Word Identification

23 Global Variables char* yytext –Contains the text of the current token. int yyleng –Holds the length of the current token.

24 %{ long charCount = 0; long wordCount = 0; long lineCount = 0; %} % \n {charCount++; lineCount++;} [ \t] charCount++; [^ \t\n]+ {wordCount++; charCount += yyleng;} % Word Count

25 Important To see all characters you can use od od -bc filename To remove carriage returns you can: dos2unix unix_filename or tr –d ‘\r’ unix_filename flex does not handle carriage returns

26 More Information Check the courseware web site. Man pages –login and type: xman flex & Library –J.Levine, et al., “lex & yacc”. –A.V. Aho and J.D. Ullman, “Principles of Compiler Design”.

27 Revision Understand what a lexical analyzer does. Know how to implement a finite automaton. Be able to use flex.


Download ppt "COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University."

Similar presentations


Ads by Google