Compiler Structures 3. Lex Objectives , Semester 2,

Compiler Structures 3. Lex Objectives 242-437, Semester 2, 2018-2019
describe lex give many examples of lex's use

Overview 1. What is lex (and flex)? 2. Lex Program Format 3. Removing Whitespace (white.l) 4. Printing Line Numbers (linenos.l) 5. Counting (counter.l) 6. Counting IDs (ids.l) 7. Matching Rules 8. More Information.

1. What is lex (and flex)? lex is a lexical analyzer generator
flex is a fast version of lex, which we'll be using lex translates REs into C code The generated code is easy to integrate into C compilers (and other applications).

Uses for Lex Convert input from one form to another.
Extract information from text files. Extract tokens for a syntax analyzer.

lex source program lex.l
Using Lex lex source program lex.l lex (flex) lex.yy.c C compiler lex.yy.c a.out a.out input stream of chars sequence of tokens

Running Flex With UNIX:
> flex foo.l > gcc –Wall -o foo lex.yy.c > ./foo < inputfile.txt You may need to include –ll (-lfl) in the gcc call. it links in the lex library You may get "warning" messages from gcc.

How Lex Works The lex-generated program (e.g. foo) will read characters from stdin, trying to match against a character sequence using its REs. Once it matches a sequence, it reads in more characters for the next RE match.

2. Lex Program Format A lex program has three sections:
REs and/or C code %% RE/action rules %% C functions

A Lex Program 1) C Code, REs 2) RE/Action rules 3) C functions
%{ int charCount=0, wordCount=0, lineCount=0; %} word [^ \t\n]* %% {word} {wordCount++; charCount += yyleng; } [\n] {charCount++; lineCount++;} . {charCount++;} int main(void) { yylex(); printf(“Chars %d, Words: %d, Lines: %d\n”, charCount, wordCount, lineCount); return 0; } 1) C Code, REs 2) RE/Action rules 3) C functions

Section 1: Defining a RE Format: Examples: name RE digit [0-9]
letter [A-Za-z] id {letter} ({letter}|{digit})* word [^ \t\n]*

Regular Expressions in Lex
x match the char x \. match the char . "string" match contents of string of chars . match any char except \n ^ match beginning of a line $ match the end of a line [xyz] match one char x, y, or z [^xyz] match any char except x, y, and z [a-z] match one of a to z

r. closure (match 0 or more r's) r+
r* closure (match 0 or more r's) r+ positive closure (match 1 or more r's) r? optional (match 0 or 1 r) r1 r2 match r1 then r2 (concatenation) r1 | r2 match r1 or r2 (union) ( r ) grouping r1 \ r2 match r1 when followed by r2 { name } match the RE defined by name

Example REs (Again) A single digit. An integer.
[0-9] A single digit. [0-9]+ An integer. [0-9]+ (\.[0-9]+)? An integer or floating point number. [+-]? [0-9]+ (\.[0-9]+)? ([eE][+-]?[0-9]+)? Integer, floating point, or scientific notation. As I said earlier, there will be 5 homeworks, each of which will contribute to 5% of your final grade. You will have at least 2 weeks to complete each of the homeworks. Talking about algorithms really helps you learn about them, so I encourage you all to work in small groups. If you don’t have anyone to work with please either me or stop by my office and I will be sure to match you up with others. PLEASE make sure you all work on each problem; you will only be hurting yourself if you leach off of your partners. Problems are HARD! I will take into account the size of your group when grading your homework. Later in the course I will even have a contest for best algorithm and give prizes out for those who are most clever in their construct. I will allow you one late homework. You *must* write on the top that you are taking your late. Homework 1 comes out next class.

Section 2: RE/Action Rule
A rule has the form: name { action } the name must be defined in section 1 the action is any C code If the named RE matches an input character sequence, then the C code is executed.

Section 3: C Functions Added to the lexical analyzer
Depending on the lex/flex version, you may need to add the function: int yywrap(void) { return 1; } it returns 1 to signal that the end of the input file means that the lexer can terminate

3. Removing Whitespace (white.l)
whitespace [ \t\n] %% {whitespace} ; { ECHO; } int yywrap(void) { return 1; } int main(void) { yylex(); // the lexical analyzer return 0; } name empty action RE ECHO macro

Usage flex output file > flex white.l > gcc -Wall -o white lex.yy.c > ./white < white.l /*white.l*//*AndrewDavison,May... >

4. Printing Linenos (linenos.l)
%{ int lineno = 1; %} %% ^(.*)\n { printf("%4d\t%s", lineno, yytext); lineno++; } int yywrap(void) { return 1; } continued

int main(int argc, char. argv[]) { if (argc > 1) { FILE
int main(int argc, char *argv[]) { if (argc > 1) { FILE *file = fopen(argv[1], "r"); if (file == NULL) { printf("Error opening %s\n", argv[1]); exit(1); } yyin = file; yylex(); fclose(yyin); return 0;

Built-in Variables There are several other built-in variables in lex.
yytext holds the matched string. yyin is the input stream. yyleng holds the length of the string. There are several other built-in variables in lex.

Usage > flex linenos.l > gcc -Wall -o linenos lex.yy.c > ./linenos textfile.txt > ./linenos < textfile.txt

/linenos < linenos. l 1 2 /. linenos. l. / 3 /
./linenos < linenos.l 1 2 /* linenos.l */ 3 /* Andrew Davison, March 2018 */ 4 5 %{ 6 int lineno = 1; 7 %} 8 9 %% : :

5. Counting (counter.l) continued
%{ int charCount = 0, wordCount = 0, lineCount = 0; %} word [^ \t\n]* %% {word} { wordCount++; charCount += yyleng; } \n { charCount++; lineCount++; } . { charCount++; } int yywrap(void) { return 1; } continued

int main(void) { yylex(); printf("Characters %d, Words: %d, Lines: %d\n", charCount, wordCount, lineCount); return 0; }

Usage > flex counter.l > gcc -Wall -o counter lex.yy.c
> ./counter < counter.l Characters 496, Words: 78, Lines: 29

6. Counting IDs (ids.l) continued
%{ int count = 0; %} digit [0-9] letter [A-Za-z] id {letter}({letter}|{digit})* %% {id} { count++; } . ; /* ignore other things */ \n ; continued

int yywrap(void) { return 1; } int main() { yylex(); printf("No
int yywrap(void) { return 1; } int main() { yylex(); printf("No. of Idents: %d\n", count); return 0; }

Usage > flex ids.l > gcc -Wall -o ids lex.yy.c > ./ids < test1.txt No. of Idents: 6 > l test1.txt this is a test bing2 *((() this5 >

7. Matching Rules beg {…} begin {…}
A rule is chosen that matches the biggest amount of input. beg {…} begin {…} Both rules can match the input string "beginning", but the second rule is chosen because it matches more. continued

Both rules can match the input string "begin",
If two rules can match the same amount of input, then the first rule is used. begin {… } [a-z]+ {…} Both rules can match the input string "begin", so the first rule is chosen

8. More Information in our library Lex and Yacc by Levine, Mason, and Brown O'Reilly; 2nd edition On UNIX: man lex info lex continued

A Compact Guide to Lex & Yacc by Tom Niemann http://epaperpress
with several calculator examples, which I'll be discussing when we get to yacc it's also on the course website in the "Niemann Tutorial" subdirectory of "Useful Info" Software.coe/Compilers/

Compiler Structures 3. Lex Objectives , Semester 2,

Similar presentations

Presentation on theme: "Compiler Structures 3. Lex Objectives , Semester 2,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Compiler Structures 3. Lex Objectives , Semester 2,

Similar presentations

Presentation on theme: "Compiler Structures 3. Lex Objectives , Semester 2,"— Presentation transcript:

Similar presentations

About project

Feedback