Presentation is loading. Please wait.

Presentation is loading. Please wait.

241-437 Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use 241-437, Semester 1, 2011-2012 3. Lex.

Similar presentations


Presentation on theme: "241-437 Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use 241-437, Semester 1, 2011-2012 3. Lex."— Presentation transcript:

1 241-437 Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use 241-437, Semester 1, 2011-2012 3. Lex

2 241-437 Compilers: lex/3 2 Overview 1. What is lex (and flex)? 2. Lex Program Format 3. Removing Whitespace (white.l) 4. Printing Line Numbers (linenos.l) 5. Counting (counter.l) 6.Counting IDs (ids.l) 7.Matching Rules 8.More Information.

3 241-437 Compilers: lex/3 3 1. What is lex (and flex)? lex is a lexical analyzer generator – –flex is a fast version of lex, which we'll be using lex translates REs into C code The generated code is easy to integrate into C compilers (and other applications).

4 241-437 Compilers: lex/3 4 Uses for Lex Convert input from one form to another. Extract information from text files. Extract tokens for a syntax analyzer.

5 241-437 Compilers: lex/3 5 Using Lex lex (flex) lex source program lex.l lex.yy.c input stream of chars C compiler a.out sequence of tokens lex.yy.c a.out

6 241-437 Compilers: lex/3 6 Running Flex With UNIX: > flex foo.l > gcc –Wall -o foo lex.yy.c >./foo < inputfile.txt You may need to include –ll (-lfl) in the gcc call. – –it links in the lex library You may get "warning" messages from gcc.

7 241-437 Compilers: lex/3 7 How Lex Works The lex-generated program (e.g. foo) will read characters from stdin, trying to match against a character sequence using its REs. Once it matches a sequence, it reads in more characters for the next RE match.

8 241-437 Compilers: lex/3 8 2. Lex Program Format A lex program has three sections: REs and/or C code % RE/action rules % C functions

9 241-437 Compilers: lex/3 9 A Lex Program %{ int charCount=0, wordCount=0, lineCount=0; %} word [^ \t\n]* % {word}{wordCount++; charCount += yyleng; } [\n]{charCount++; lineCount++;}.{charCount++;} % int main(void) { yylex(); printf(“Chars %d, Words: %d, Lines: %d\n”, charCount, wordCount, lineCount); return 0; } 1) C Code, REs 2) RE/Action rules 3) C functions

10 241-437 Compilers: lex/3 10 Section 1: Defining a RE Format: nameRE Examples: digit [0-9] letter [A-Za-z] id {letter} ({letter}|{digit})* word [^ \t\n]*

11 241-437 Compilers: lex/3 11 Regular Expressions in Lex xmatch the char x \.match the char. "string"match contents of string of chars. match any char except \n ^match beginning of a line $match the end of a line [xyz]match one char x, y, or z [^xyz]match any char except x, y, and z [a-z]match one of a to z

12 241-437 Compilers: lex/3 12 r*closure (match 0 or more r's) r+positive closure (match 1 or more r's) r? optional (match 0 or 1 r) r1 r2match r1 then r2 (concatenation) r1 | r2match r1 or r2 (union) ( r ) grouping r1 \ r2match r1 when followed by r2 { name }match the RE defined by name

13 241-437 Compilers: lex/3 13 Example REs (Again) [0-9] A single digit. [0-9]+ An integer. [0-9]+ (\.[0-9]+)? An integer or floating point number. [+-]? [0-9]+ (\.[0-9]+)? ([eE][+-]?[0-9]+)? Integer, floating point, or scientific notation.

14 241-437 Compilers: lex/3 14 Section 2: RE/Action Rule A rule has the form: name{ action } – –the name must be defined in section 1 – –the action is any C code If the named RE matches an input character sequence, then the C code is executed.

15 241-437 Compilers: lex/3 15 Section 3: C Functions Added to the lexical analyzer Depending on the lex/flex version, you may need to add the function: int yywrap(void) { return 1; } – –it returns 1 to signal that the end of the input file means that the lexer can terminate

16 241-437 Compilers: lex/3 16 3. Removing Whitespace (white.l) whitespace [ \t\n] % {whitespace} ;. { ECHO; } % int yywrap(void) { return 1; } int main(void) { yylex(); // the lexical analyzer return 0; } empty action ECHO macro name RE

17 241-437 Compilers: lex/3 17 Usage > flex white.l > gcc -Wall -o white lex.yy.c >./white < white.l /*white.l*//*AndrewDavison,May... > flex output file

18 241-437 Compilers: lex/3 18 4. Printing Linenos (linenos.l) %{ int lineno = 1; %} % ^(.*)\n { printf("%4d\t%s", lineno, yytext); lineno++; } % int yywrap(void) { return 1; } continued

19 241-437 Compilers: lex/3 19 int main(int argc, char *argv[]) { if (argc > 1) { FILE *file = fopen(argv[1], "r"); if (file == NULL) { printf("Error opening %s\n", argv[1]); exit(1); } yyin = file; } yylex(); fclose(yyin); return 0; }

20 241-437 Compilers: lex/3 20 Built-in Variables yytext holds the matched string. yyin is the input stream. yyleng holds the length of the string. There are several other built-in variables in lex.

21 241-437 Compilers: lex/3 21 Usage > flex linenos.l > gcc -Wall -o linenos lex.yy.c >./linenos textfile.txt >./linenos < textfile.txt

22 241-437 Compilers: lex/3 22./linenos < linenos.l 1 2 /* linenos.l */ 3 /* Andrew Davison, March 2005 */ 4 5 %{ 6 int lineno = 1; 7 %} 8 9 % : :

23 241-437 Compilers: lex/3 23 5. Counting (counter.l) %{ int charCount = 0, wordCount = 0, lineCount = 0; %} word [^ \t\n]* % {word} { wordCount++; charCount += yyleng; } \n { charCount++; lineCount++; }. { charCount++; } % int yywrap(void) { return 1; } continued

24 241-437 Compilers: lex/3 24 int main(void) { yylex(); printf("Characters %d, Words: %d, Lines: %d\n", charCount, wordCount, lineCount); return 0; }

25 241-437 Compilers: lex/3 25 Usage > flex counter.l > gcc -Wall -o counter lex.yy.c >./counter < counter.l Characters 496, Words: 78, Lines: 29

26 241-437 Compilers: lex/3 26 6. Counting IDs (ids.l) %{ int count = 0; %} digit [0-9] letter [A-Za-z] id {letter}({letter}|{digit})* % {id} { count++; }. ; /* ignore other things */ \n ; % continued

27 241-437 Compilers: lex/3 27 int yywrap(void) { return 1; } int main() { yylex(); printf("No. of Idents: %d\n", count); return 0; }

28 241-437 Compilers: lex/3 28 Usage > flex ids.l > gcc -Wall -o ids lex.yy.c >./ids < test1.txt No. of Idents: 6 > l test1.txt this is a test 177 23 bing2 *((() this5 >

29 241-437 Compilers: lex/3 29 7. Matching Rules 1. 1. A rule is chosen that matches the biggest amount of input. beg{…} begin{…} Both rules can match the input string "beginning", but the second rule is chosen because it matches more. continued

30 241-437 Compilers: lex/3 30 2. 2. If two rules can match the same amount of input, then the first rule is used. begin{… } [a-z]+{…} Both rules can match the input string "begin", so the first rule is chosen

31 241-437 Compilers: lex/3 31 8. More Information Lex and Yacc by Levine, Mason, and Brown O'Reilly; 2nd edition On UNIX: – –man lex – –info lex continued in our library

32 241-437 Compilers: lex/3 32 A Compact Guide to Lex & Yacc by Tom Niemann http://epaperpress.com/lexandyacc/ – –with several calculator examples, which I'll be discussing when we get to yacc – –it's also on the course website in the "Niemann Tutorial" subdirectory of "Useful Info" http://fivedots.coe.psu.ac.th/ Software.coe/Compilers/


Download ppt "241-437 Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use 241-437, Semester 1, 2011-2012 3. Lex."

Similar presentations


Ads by Google