Download presentation
Presentation is loading. Please wait.
Published byReginald Carter Modified over 9 years ago
1
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill
2
Introduction There are a variety of parser generators and scanner generators The standards for UNIX seem to be lex and yacc –Yacc seems to have replaced something earlier The GNU versions are flex and bison We will use flex++ which is somewhat more parameterizable We will also use SLK which seems to accept a better set of languages Copyright © 2015 Curt Hill
3
Flex++ This is a scanner generator It takes an input of three sections –Definitions for later use –Rules that accept a string and produce a token –Code which is passed through as-is These three are separated by a % The percent is used to indicate options and a variety of other things as well Copyright © 2015 Curt Hill
4
Theory of Operation Almost all the scanner generators produce a program that is a character driven finite state automaton Each character moves the automaton from one state to another –Self loops are possible Some of these states recognize the completion of a keyword or multi- character symbol Copyright © 2015 Curt Hill
5
Regular Expressions The typical way to describe what you want is using regular expressions –Recall that a type 3 language is regular –Every regular language may be recognized by a finite state automaton Thus we will define what we mean by an identifier with a regular expression –Or an integral constant, floating constant, character string etc Copyright © 2015 Curt Hill
6
Sections As mentioned there are three sections –Definition –Rules –Code Separated by the % directive We will now look at these Copyright © 2015 Curt Hill
7
Definition Section As the name suggests we set up directives and definitions for later use Directives are options used in Flex++ –Usually start with a % We also define regular expressions we will use later Copyright © 2015 Curt Hill
8
Directives %name scanner name –This is the class name %header{ … %} –Code that will be inserted at the beginning of the file – includes etc. –May also use %{ … %} %define name content –Defines a macro –Next screen shows some predefined macros Copyright © 2015 Curt Hill
9
Predefined Directives %define TEXT yytext –The text of something that matched a regular expression –Need this for identifiers and the like %define LENG yyleng –The length of the text %define LEX yylex –Scanner function %define LEX_RETURN int –The return type of LEX Copyright © 2015 Curt Hill
10
More Predefined %define CLASS name –Defaults to same as %name %define INHERIT name –Only needed if your class is a derivation %define MEMBERS mem –Extra member data for the class %define CONSTRUCTOR_INIT lg –The initialization list including : Any of these predefined names may be changed to something else Copyright © 2015 Curt Hill
11
Named Expressions The definition section may also contain any regular expressions that will be handy in the next section The format is: NAME expression –Where is the name you will use later and expression is the regular expression Example: DIGIT [0-9] –This is a set that matches any digit Copyright © 2015 Curt Hill
12
Example Definitions Copyright © 2015 Curt Hill %name Scanner %define MEMBERS public: int line, column; %define CONSTRUCTOR_INIT : line(1), column(1) %header{ #include #include "CDHConstants.h" using namespace std; %} LETTER [A-Za-z] DIGIT [0-9] DIGIT1 [1-9]
13
Rules Section This is where you make rules Each rule matches a construct A match provokes an action Each rule matches a terminal and returns the token Or it disposes of things that may ignored The format is: RE action –Where RE is a regular expression and action is C++ code to execute upon match Copyright © 2015 Curt Hill
14
Rule Examples Recall the definitions before Here is a blank killer: " " { ++column; } –The column variable is defined earlier The C++ code could be multiple lines –For example removing comments will take some work Here is a terminal: "=" { ++column; cout << "equal\n"; return EQUAL_; } Copyright © 2015 Curt Hill
15
More Examples Reserved words are easy: "int“ { column += 3; cout << "int\n"; return INT_; } Other non-terminals are harder: {DIGIT1}{DIGIT}* { column += strlen(yytext); cout << "number: “ << yytext << “\n”; return NUMBER_; } Copyright © 2015 Curt Hill
16
Code Section This is just C++ code that is copied as-is onto the end of the file Often it is nothing It may be a main function that tests the scanner Copyright © 2015 Curt Hill
17
Command line Options -ofilename –Gives the output name with extension –Default is lex.yy.c -hfilename –Generates a.h header for using this in another file The input file is last in the line Copyright © 2015 Curt Hill
18
Process 1 Not all that difficult Craft the BNF Feed this into your parser generator Start creating the input to Flex++ –Each parser generator generates a file that will list the terminals In SLK this is the XXXKeywords.txt file or the XXXConstants.h Copyright © 2015 Curt Hill
19
Processs 2 Start the definition section –Name the scanner –Set whatever definitions make sense –The header define gets includes Determine what the scanner will return: –Integer –Enumeration –Object Copyright © 2015 Curt Hill
20
Process 3 Start the rules section For each terminal create a rule –Punctuation and reserved words are easy –Numbers, names are somewhat harder –Comments hardest Copyright © 2015 Curt Hill
21
Finally We generate a makefile to do everything but run it Next we do a demo Then an assignment Soon consider the generated parser Copyright © 2015 Curt Hill
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.