Using SLK and Flex++ Followed by a Demo Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015-2017 Curt Hill
Introduction There are a variety of parser generators and scanner generators The standards for UNIX seem to be lex and yacc Yacc seems to have replaced something earlier The GNU versions are flex and bison We will use flex++ which is somewhat more parameterizable We will also use SLK which seems to accept a better set of languages Copyright © 2015-2017 Curt Hill
Flex++ This is a scanner generator It takes an input of three sections Definitions for later use Rules that accept a string and produce a token Code which is passed through as-is These three are separated by a %% The percent is used to indicate options and a variety of other things as well Copyright © 2015-2017 Curt Hill
Theory of Operation Almost all the scanner generators produce a program that is a character driven finite state automaton Uses regular expressions Each character moves the automaton from one state to another Self loops are possible Some of these states recognize the completion of a keyword or multi-character symbol Copyright © 2015-2017 Curt Hill
Regular Expressions The typical way to describe what you want is using regular expressions Recall that a type 3 language is regular Every regular language may be recognized by a finite state automaton Thus we will define what we mean by an identifier with a regular expression Or an integral constant, floating constant, character string etc Copyright © 2015-2017 Curt Hill
Sections As mentioned there are three sections Definition Rules Code Separated by the %% directive We will now look at these Copyright © 2015-2017 Curt Hill
Definition Section As the name suggests we set up directives and definitions for later use Directives are options used in Flex++ Usually start with a % We also define regular expressions we will use later in one or more rules Copyright © 2015-2017 Curt Hill
Directives %name scanner name %header{ … %} %define name content This is the class name %header{ … %} Code that will be inserted at the beginning of the file – includes etc. May also use %{ … %} %define name content Defines a macro Next screen shows some predefined macros Copyright © 2015-2017 Curt Hill
Predefined Directives %define TEXT yytext The text of something that matched a regular expression Need this for identifiers and the like %define LENG yyleng The length of the text %define LEX yylex Scanner function Copyright © 2015-2017 Curt Hill
Return Type %define LEX_RETURN int The default is int The return type of LEX The default is int If you want a class there it should have an int constructor Could also be void * This should signify the end of file token Copyright © 2015-2017 Curt Hill
More Predefined %define CLASS name %define INHERIT name Defaults to same as %name %define INHERIT name Only needed if your class is a derivation %define MEMBERS mem Extra member data for the class %define CONSTRUCTOR_INIT lg The initialization list including : Any of these predefined names may be changed to something else Copyright © 2015-2017 Curt Hill
Named Expressions The definition section may also contain any regular expressions that will be handy in the next section The format is: NAME expression Where is the name you will use later and expression is the regular expression Example: DIGIT [0-9] This is a set that matches any digit Copyright © 2015-2017 Curt Hill
Example Definitions %name Scanner %define MEMBERS public: int line, column; %define CONSTRUCTOR_INIT : line(1), column(1) %header{ #include<sstream> #include <iostream> #include "CDHConstants.h" using namespace std; %} LETTER [A-Za-z] DIGIT [0-9] DIGIT1 [1-9] Copyright © 2015-2017 Curt Hill
Rules Section You make rules Each rule matches a construct A match provokes an action Each rule matches a terminal and returns the token Or it disposes of things that may ignored The format is: RE action Where RE is a regular expression and action is C++ code to execute upon match Copyright © 2015-2017 Curt Hill
Rule Examples Recall the definitions before Here is a blank killer: " " { ++column; } The column variable is defined earlier The C++ code could be multiple lines For example removing comments will take some work Here is a terminal: "=" { ++column; cout << "equal\n"; return EQUAL_; } Copyright © 2015-2017 Curt Hill
More Examples Reserved words are easy: "int“ { column += 3; cout << "int\n"; return INT_; } Other non-terminals are harder: {DIGIT1}{DIGIT}* { column += strlen(yytext); cout << "number: “ << yytext << “\n”; return NUMBER_; } Copyright © 2015-2017 Curt Hill
Code Section This is just C++ code that is copied as-is onto the end of the file Often it is nothing It may be a main function that tests the scanner Copyright © 2015-2017 Curt Hill
Command line Options -ofilename -hfilename -L There are many others Gives the output name with extension Default is lex.yy.c -hfilename Generates a .h header for using this in another file -L Do not put a # comment in output There are many others The input file is last in the line Copyright © 2015-2017 Curt Hill
Process 1 We saw this in a previous presentation Make the BNF Feed this into your parser generator Make sure reserved words are all that you intended Design your token Start the definition section Determine what the scanner will return: Copyright © 2015-2017 Curt Hill
Process 2 Start the rules section For each terminal create a rule Tokens should return an initialized object Scanner should now be ready Copyright © 2015-2017 Curt Hill
Finally Next we do a demo Then an assignment This one with or without original BNF Then an assignment Then consider the generated parser Copyright © 2015-2017 Curt Hill