A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721.

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
CSc 453 Lexical Analysis (Scanning)
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched.
Lexical Analysis with lex(1) and flex(1) © 2011 Clinton Jeffery.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Lexical Analysis - Scanner- Contd Computer Science Rensselaer Polytechnic Compiler Design Lecture 4(01/26/98)
Tutorial 1 Scanner & Parser
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
Lecture 2: Lexical Analysis CS 540 George Mason University.
CS 536 Spring Learning the Tools: JLex Lecture 6.
Saumya Debray The University of Arizona Tucson, AZ 85721
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 YACC Parser Generator. 2 YACC YACC (Yet Another Compiler Compiler) Produce a parser for a given grammar.  Compile a LALR(1) grammar Original written.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
Compiler Tools Lex/Yacc – Flex & Bison. Compiler Front End (from Engineering a Compiler) Scanner (Lexical Analyzer) Maps stream of characters into words.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Introduction to Lex Ying-Hung Jiang
Introduction to Yacc Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery.
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
CSc 453 Lexical Analysis (Scanning)
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Syntactic Analysis Tools
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Applications of Context-Free Grammars (CFG) Parsers. The YACC Parser-Generator. by: Saleh Al-shomrani.
1 LEX & YACC Tutorial February 28, 2008 Tom St. John.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
1 Steps to use Flex Ravi Chotrani New York University Reviewed By Prof. Mohamed Zahran.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
Announcements Assignment 1 due Wednesday at 11:59PM Quiz 1 on Thursday 1.
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
LEX SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Compiler Construction Sohail Aslam Lecture Parser Generators  YACC – Yet Another Compiler Compiler appeared in 1975 as a Unix application.  The.
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Lexical Analysis.
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
CSc 453 Lexical Analysis (Scanning)
Using SLK and Flex++ Followed by a Demo
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University
Regular Languages.
TDDD55- Compilers and Interpreters Lesson 2
Review: Compiler Phases:
Compiler Structures 3. Lex Objectives , Semester 2,
Saumya Debray The University of Arizona Tucson, AZ 85721
Compiler Design Yacc Example "Yet Another Compiler Compiler"
More on flex.
Regular Expressions and Lexical Analysis
Systems Programming & Operating Systems Unit – III
NFAs, scanners, and flex.
Compiler Design 3. Lexical Analyzer, Flex
Lexical Analysis - Scanner-Contd
Presentation transcript:

A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

A quick tutorial on Lex2 flex (and lex): Overview Scanner generators:  Helps write programs whose control flow is directed by instances of regular expressions in the input stream. Input: a set of regular expressions + actions Output: C code implementing a scanner: function: yylex() file: lex.yy.c flex (or lex)

A quick tutorial on Lex3 Using flex lex input spec (regexps + actions) file: lex.yy.c yylex() { … } driver code compiler flex user supplies main() {…} or parser() {…}

A quick tutorial on Lex4 flex: input format An input file has the following structure: definitions % rules % user code optional required Shortest possible legal flex input: %

A quick tutorial on Lex5 Definitions A series of:  name definitions, each of the form name definition e.g.: DIGIT [0-9] CommentStart "/*" ID [a-zA-Z][a-zA-Z0-9]*  start conditions  stuff to be copied verbatim into the flex output (e.g., declarations, #includes): – enclosed in %{ … }%, or – indented

A quick tutorial on Lex6 Rules The rules portion of the input contains a sequence of rules. Each rule has the form pattern action where :  pattern describes a pattern to be matched on the input  pattern must be un-indented  action must begin on the same line.

A quick tutorial on Lex7 Patterns Essentially, extended regular expressions.  Syntax: similar to grep (see man page)  > to match “end of file”  Character classes: – [:alpha:], [:digit:], [:alnum:], [:space:], etc. (see man page)  {name} where name was defined earlier. “start conditions” can be used to specify that a pattern match only in specific situations.

A quick tutorial on Lex8 Example %{ #include %} dgt [0-9] % {dgt}+ return atoi(yytext); % void main() { int val, total = 0, n = 0; while ( (val = yylex()) > 0 ) { total += val; n++; } if (n > 0) printf(“ave = %d\n”, total/n); } A flex program to read a file of (positive) integers and compute the average:

A quick tutorial on Lex9 Example %{ #include %} dgt [0-9] % {dgt}+ return atoi(yytext); % void main() { int val, total = 0, n = 0; while ( (val = yylex()) > 0 ) { total += val; n++; } if (n > 0) printf(“ave = %d\n”, total/n); } A flex program to read a file of (positive) integers and compute the average: Definition for a digit (could have used builtin definition [:digit:] instead) Rule to match a number and return its value to the calling routine Driver code (could instead have been in a separate file) definitions rules user code

A quick tutorial on Lex10 Example %{ #include %} dgt [0-9] % {dgt}+ return atoi(yytext); % void main() { int val, total = 0, n = 0; while ( (val = yylex()) > 0 ) { total += val; n++; } if (n > 0) printf(“ave = %d\n”, total/n); } A flex program to read a file of (positive) integers and compute the average: definitions rules user code defining and using a name

A quick tutorial on Lex11 Example %{ #include %} dgt [0-9] % {dgt}+ return atoi(yytext); % void main() { int val, total = 0, n = 0; while ( (val = yylex()) > 0 ) { total += val; n++; } if (n > 0) printf(“ave = %d\n”, total/n); } A flex program to read a file of (positive) integers and compute the average: definitions rules user code defining and using a name char * yytext; a buffer that holds the input characters that actually match the pattern

A quick tutorial on Lex12 Example %{ #include %} dgt [0-9] % {dgt}+ return atoi(yytext); % void main() { int val, total = 0, n = 0; while ( (val = yylex()) > 0 ) { total += val; n++; } if (n > 0) printf(“ave = %d\n”, total/n); } A flex program to read a file of (positive) integers and compute the average: definitions rules user code defining and using a name char * yytext; a buffer that holds the input characters that actually match the pattern Invoking the scanner: yylex() Each time yylex() is called, the scanner continues processing the input from where it last left off. Returns 0 on end-of-file.

Avoiding compiler warnings If compiled using “gcc –Wall” the previous flex file will generate compiler warnings: lex.yy.c: … : warning: `yyunput’ defined but not used lex.yy.c: … : warning: `input’ defined but not used These can be removed using ‘%option’ declarations in the first part of the flex input file: %option nounput %option noinput A quick tutorial on Lex13

A quick tutorial on Lex14 Matching the Input When more than one pattern can match the input, the scanner behaves as follows:  the longest match is chosen;  if multiple rules match, the rule listed first in the flex input file is chosen;  if no rule matches, the default is to copy the next character to stdout. The text that matched (the “token”) is copied to a buffer yytext.

A quick tutorial on Lex15 Matching the Input (cont’d) Pattern to match C-style comments: /* … */ "/*"(. |\n)*"*/" Input: #include /* definitions */ int main(int argc, char * argv[ ]) { if (argc <= 1) { printf(“Error!\n”); /* no arguments */ } printf(“%d args given\n”, argc); return 0; }

A quick tutorial on Lex16 Matching the Input (cont’d) Pattern to match C-style comments: /* … */ "/*"(. |\n)*"*/" Input: #include /* definitions */ int main(int argc, char * argv[ ]) { if (argc <= 1) { printf(“Error!\n”); /* no arguments */ } printf(“%d args given\n”, argc); return 0; } longest match:

A quick tutorial on Lex17 Matching the Input (cont’d) Pattern to match C-style comments: /* … */ "/*"(. |\n)*"*/" Input: #include /* definitions */ int main(int argc, char * argv[ ]) { if (argc <= 1) { printf(“Error!\n”); /* no arguments */ } printf(“%d args given\n”, argc); return 0; } longest match: Matched text shown in blue

A quick tutorial on Lex18 Start Conditions Used to activate rules conditionally.  Any rule prefixed with will be activated only when the scanner is in start condition S. Declaring a start condition S:  in the definition section: %x S – “%x” specifies “exclusive start conditions” – flex also supports “inclusive start conditions” (“%s”), see man pages. Putting the scanner into start condition S:  action: BEGIN(S)

A quick tutorial on Lex19 Start Conditions (cont’d) Example:  [^"]* { …match string body… } – [^"] matches any character other than " – The rule is activated only if the scanner is in the start condition STRING. INITIAL refers to the original state where no start conditions are active. matches all start conditions.

A quick tutorial on Lex20 Using Start Conditions Start conditions let us explicitly simulate finite state machines. This lets us get around the “longest match” problem for C- style comments. / * non-* * * / non -{ /,* } flex input: %x S1, S2, S3 % "/" BEGIN(S1); "*" BEGIN(S2); [^*] ; /* stay in S2 */ "*" BEGIN(S3); "*" ; /* stay in S3 */ [^*/] BEGIN(S2); "/" BEGIN(INITIAL); S1S2S3 FSA for C comments :

A quick tutorial on Lex21 Using Start Conditions Start conditions let us explicitly simulate finite state machines. This lets us get around the “longest match” problem for C- style comments. / * non-* * * / non -{ /,* } flex input: %x S1, S2, S3 % "/" BEGIN(S1); "*" BEGIN(S2); [^*] ; /* stay in S2 */ "*" BEGIN(S3); "*“ ; /* stay in S3 */ [^*/] BEGIN(S2); "/" BEGIN(INITIAL); S1S2S3 FSA for C comments :

A quick tutorial on Lex22 Using Start Conditions Start conditions let us explicitly simulate finite state machines. This lets us get around the “longest match” problem for C- style comments. / * non-* * * / non -{ /,* } flex input: %x S1, S2, S3 % "/" BEGIN(S1); "*" BEGIN(S2); [^*] ; /* stay in S2 */ "*" BEGIN(S3); "*“ ; /* stay in S3 */ [^*/] BEGIN(S2); "/" BEGIN(INITIAL); S1S2S3 FSA for C comments :

A quick tutorial on Lex23 Using Start Conditions Start conditions let us explicitly simulate finite state machines. This lets us get around the “longest match” problem for C- style comments. / * non-* * * / non -{ /,* } flex input: %x S1, S2, S3 % "/" BEGIN(S1); "*" BEGIN(S2); [^*] ; /* stay in S2 */ "*" BEGIN(S3); "*“ ; /* stay in S3 */ [^*/] BEGIN(S2); "/" BEGIN(INITIAL); S1S2S3 FSA for C comments :

A quick tutorial on Lex24 Using Start Conditions Start conditions let us explicitly simulate finite state machines. This lets us get around the “longest match” problem for C- style comments. / * non-* * * / non -{ /,* } flex input: %x S1, S2, S3 % "/" BEGIN(S1); "*" BEGIN(S2); [^*] ; /* stay in S2 */ "*" BEGIN(S3); "*“ ; /* stay in S3 */ [^*/] BEGIN(S2); "/" BEGIN(INITIAL); S1S2S3 FSA for C comments :

A quick tutorial on Lex25 Using Start Conditions Start conditions let us explicitly simulate finite state machines. This lets us get around the “longest match” problem for C- style comments. / * non-* * * / non -{ /,* } flex input: %x S1, S2, S3 % "/" BEGIN(S1); "*" BEGIN(S2); [^*] ; /* stay in S2 */ "*" BEGIN(S3); "*“ ; /* stay in S3 */ [^*/] BEGIN(S2); "/" BEGIN(INITIAL); S1S2S3 FSA for C comments :

A quick tutorial on Lex26 Using Start Conditions Start conditions let us explicitly simulate finite state machines. This lets us get around the “longest match” problem for C- style comments. / * non-* * * / non -{ /,* } flex input: %x S1, S2, S3 % "/" BEGIN(S1); "*" BEGIN(S2); [^*] ; /* stay in S2 */ "*" BEGIN(S3); "*“ ; /* stay in S3 */ [^*/] BEGIN(S2); "/" BEGIN(INITIAL); S1S2S3 FSA for C comments :

A quick tutorial on Lex27 Using Start Conditions Start conditions let us explicitly simulate finite state machines. This lets us get around the “longest match” problem for C- style comments. / * non-* * * / non -{ /,* } flex input: %x S1, S2, S3 % "/" BEGIN(S1); "*" BEGIN(S2); [^*] ; /* stay in S2 */ "*" BEGIN(S3); "*“ ; /* stay in S3 */ [^*/] BEGIN(S2); "/" BEGIN(INITIAL); S1S2S3 FSA for C comments :

A quick tutorial on Lex28 Putting it all together Scanner implemented as a function int yylex();  return value indicates type of token found (encoded as a +ve integer);  the actual string matched is available in yytext. Scanner and parser need to agree on token type encodings  let yacc generate the token type encodings – yacc places these in a file y.tab.h  use “#include y.tab.h” in the definitions section of the flex input file. When compiling, link in the flex library using “-lfl”