FLEX Fast Lexical Analyzer EECS 6083
Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard input In return, Flex generates C code for a scanner function InputFile.lex / Standard Input Flex lex.yy.c defines yylex()
Introduction The generated code can be used in two ways: – Compiled with the Flex library to produce a scanner executable lex.yy.c C/C++ Compiler Flex library Scanner executable
Introduction The generated code can be used in two ways: – Compiled with other compiler source code and Flex library to produce an entire compiler lex.yy.cC/C++ Compiler Flex library Other compiler source code Compiler executable
Input File Format The Input file contains three major sections: – Definitions – Rules – User Code (Optional) Each Section separated by % % characters * Extracted from Flex User Manual
Definitions Section Input Format: name definition “name” is the identifier for the token type being scanned for. “definition” is the regular expression that characterizes that token. Particular definition can be referenced by {name} * Extracted from Flex User Manual
Definitions Section Un-indented comments copied verbatim to output file from /* to */ Code bracketed by %{ and %} copied verbatim to output minus the brackets Code bracketed by %top{ and } place on top of output file, input order preserved * Extracted from Flex User Manual
Rules Section Input Format: pattern action “pattern”: – describes what the scanner may encounter when scanning – is created using extended regular expressions – ends after first non-escaped whitespace character “action”: – refers to the code that is implemented when a pattern is encountered – begins after pattern and ends either at end of line or with closing bracket } Comments bracketed by /* and */ are ignored. Brackets %{ and %} used to declare local variables before first rule. * Extracted from Flex User Manual
Input Matching Scanner matches strings in scans to patterns user has defined. In case of multiple matches, scanner chooses pattern matching most text. If same amount of text is matched, scanner chooses first pattern defined When string is matched: – global pointer yytext set to location of string – Length of string saved in yyleng Any unmatched strings will be copied directly to the output.
Input Matching Pointer option (Default) – Advantages: faster, no overflow issues – Disadvantage: stored char destroyed with unput() not portable Array option: – Advantages: stored chars can be safely manipulated (internally and externally) unput() doesn’t destroy – Disadvantages : slower than pointer option cannot use with C++ scanner classes Form of yytext can be a character pointer or a character array.
Actions Many Flex functions and macros exist. – ECHO: copies yytext to output – BEGIN : places scanner in start condition – REJECT: looks for second best matching rule – yymore(): appends next matched text to current – yyless(n): places last n characters back in input stream – unput(): place current character back into beginning of input stream – input(): reads the next character from the input stream – YY_FLUSH_BUFFER: flushes scanner’s internal buffer – yyterminate(): terminates scanning and returns 0 to scanner’s caller.
Start Conditions Start Conditions allow state specific processing to occur Declared in definitions section Types: – Inclusive (%s): recognizes start condition and general patterns – Exclusive (%x): recognizes only start condition patterns * Extracted from Flex User Manual
Values Available to User Flex functions and variables available to user – char* yytext text of current token, modifiable – int yyleng length of current token text – FILE* yyinpointer to file scanner reading from – FILE* yyoutpointer to file scanner outputting to – void yyrestart( FILE *new_file ) directs scanner to scan new_file – YY_STARTreturns int corresponding to current start condition.
Interfacing with YACC The parser-generator YACC is designed to use Flex for scanning. All token definitions placed in y.tab.h Token text stored in global variable yylval YACC will call yylex() to get the next token yylex() will return token id and store token text * Extracted from Flex User Manual
Generating C++ Scanners C++ Flex scanner can be created two ways: – Compile Flex input file and library with C++ compiler – Compile Flex with “-+“ or “%option c++ “ lex.yy.cc generated containing two scanner classes FlexLexer: contains user value members (yyleng) yyFlexLexer: access to scanner specific methods (yylex()) * Extracted from Flex User Manual
Survey of Scanner Options Batch/Interactive (default) mode – scanner does/doesn’t look ahead one character to recognize token Enable start condition stack yytext character pointer (default)/array mode Automatically create main(), consisting only of yylex() Debug mode – indicates when a rule is matched Reentrant mode – for multithread scanning * Extracted from Flex User Manual
References Flex User Manual l#Top l#Top Quick Tutorial on using Flex with Bison own-toy-compiler/5/ own-toy-compiler/5/