1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Chapter 2 Lexical Analysis Nai-Wei Lin. Lexical Analysis Lexical analysis recognizes the vocabulary of the programming language and transforms a string.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Lexical Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 2.
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Scanning with Jflex.
Lecture 2: Lexical Analysis CS 540 George Mason University.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CS 536 Spring Learning the Tools: JLex Lecture 6.
1 Chapter 2 A Simple Compiler. 2 Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
C Tokens Identifiers Keywords Constants Operators Special symbols.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lecture 2: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
Other Issues - § 3.9 – Not Discussed More advanced algorithm construction – regular expression to DFA directly.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
Compiler Tools Lex/Yacc – Flex & Bison. Compiler Front End (from Engineering a Compiler) Scanner (Lexical Analyzer) Maps stream of characters into words.
By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.
Introduction to Lex Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
Introduction to Lex Fan Wu
Lex.
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
CSc 453 Lexical Analysis (Scanning)
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
LEX SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Sung-Dong Kim, School of Computer Engineering, Hansung University
Lexical Analysis.
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
CSc 453 Lexical Analysis (Scanning)
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University
Regular Languages.
TDDD55- Compilers and Interpreters Lesson 2
Lexical Analysis Why separate lexical and syntax analyses?
Other Issues - § 3.9 – Not Discussed
Compiler Structures 3. Lex Objectives , Semester 2,
Appendix B.1 Lex Appendix B.1 -- Lex.
Compiler Structures 2. Lexical Analysis Objectives
More on flex.
Regular Expressions and Lexical Analysis
Systems Programming & Operating Systems Unit – III
Lex Appendix B.1 -- Lex.
Presentation transcript:

1 Using Lex

Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource code

3 The Structure of a Lex Program (Definition section) % (Rules section) % (User subroutines section)

Flex Programs %{ auxiliary declarations %} regular definitions % translation rules % auxiliary procedures

5 %{ /* * this sample demonstrates (very) simple recognition: * a verb/not a verb. */ %} % [\t ]+ /* ignore white space */ ; is | am | are | were | was | be | being | been | do | does | did | will | would | should | can | could | has | have | had | go { printf("%s: is a verb\n", yytext); } [a-zA-Z]+ { printf("%s: is not a verb\n", yytext); }.|\n { ECHO; /* normal default anyway */ } % main() { yylex(); } Example 1-1: Word recognizer ch1-02.l

6 The definition section Lex copies the material between “%{“ and “%}” directly to the generated C file, so you may write any valid C codes here

7 Rules section Each rule is made up of two parts –A pattern –An action E.g. [\t ]+ /* ignore white space */ ;

8 Rules section (Cont’d) E.g. is | am | are | were | was | be | being | been | do | does | did | will | would | should | can | could | has | have | had | go { printf("%s: is a verb\n", yytext); }

9 Rules section (Cont’d) E.g. [a-zA-Z]+ { printf("%s: is not a verb\n", yytext); }.|\n { ECHO; /* normal default anyway */ } Lex had a set of simple disambiguating rules: 1.Lex patterns only match a given input character or string once 2.Lex executes the action for the longest possible match for the current input

10 User subroutines section It can consists of any legal C code Lex copies it to the C file after the end of the Lex generated code % main() { yylex(); }

11 Regular Expressions Regular expressions used by Lex catenation. the class of all characters except newline. * 0 or more [] character class ^ not (left most) $ 以 newline 為跟隨字元 ab$ {} macroexpansion of a symbol ex: {Digit}+ \ escape character + once or more ? Zero time or once | or “…” … to be taken / 以甚麼為跟隨字元 ab/cd () used for more complex expressions - range

Functions and Variables yylex() a function implementing the lexical analyzer and returning the token matched yytext a global pointer variable pointing to the lexeme matched yyleng a global variable giving the length of the lexeme matched yylval an external global variable storing the attribute of the token

Functions and Variables yywrap() –If yywrap() returns false (zero), then it is assumed that the function has gone ahead and set up yyin to point to another input file, and scanning continues. If it returns true (non-zero), then the scanner terminates, returning 0 to its caller. –If you do not supply your own version of yywrap(), then you must either use % option noyywrap or…. 13

14 Examples of Regular Expressions [0-9] [0-9]+ [0-9]* -?[0-9]+ [0-9]*\.[0-9]+ ([0-9]+)|([0-9]*\.[0-9]+) -?(([0-9]+)|([0-9]*\.[0-9]+)) [eE][-+]?[0-9]+ -?(([0-9]+)|([0-9]*\.[0-9]+))([eE][-+]?[0-9]+)?)

15 Example 2-1 % [\n\t ] ; -?(([0-9]+)|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) { printf("number\n"); }. ECHO; % main() { yylex(); }

16 A Word Counting Program The definition section %{ unsigned charCount = 0, wordCount = 0, lineCount = 0; %} word [^ \t\n]+ eol \n

17 A Word Counting Program (Cont’d) The rules section {word} { wordCount++; charCount += yyleng; } {eol} { charCount++; lineCount++; }. charCount++;

18 A Word Counting Program (Cont’d) The user subroutines section main(argc,argv) int argc; char **argv; { if (argc > 1) { FILE *file; file = fopen(argv[1], "r"); if (!file) { fprintf(stderr,"could not open %s\n",argv[1]); exit(1); } yyin = file; } yylex(); printf("%d %d %d\n",charCount, wordCount, lineCount); return 0; }

stripquotes No tossing mechanism provided. int frompos, topos, numquotes=2; For (frompos=1;frompos<yyleng;frompos++) { –yytext[topos++]=yytext[frompos]; –if(yytext[frompos]==‘”’ && yytext[frompos+1]==‘”’) {frompos++;numqotes++;}} yyleng-=numquotes; yytext[yyleng]=‘\0’; 19

Micro scanner auxiliary declarations –%option noyywrap –%{ –#ifndef DEFINE –#define DEFINE 1 –#include – #include "token.h" –#endif – #define YYLMAX 33 – char token_buffer[YYLMAX]; – FILE *out_fd, *status_fd; – extern void check_id(char *); – extern void list_token_type(token); – extern void get_file_name(char *, char *, char *); –%} 20

Micro scanner regular definitions –letter [a-zA-Z] –digit [0-9] –KEYWORD "BEGIN"|"begin"|"END"|"end"|"READ"|"read"|"W RITE"|"write"|"SCANEOF" –literal {digit}+ –IDENTIFIER {letter}+[{letter}|{digit}|"_"]* –special_char "+"|"-"|"*"|":="|"("|")"|";"|"," –comment “--”.*[\n] 21

Micro scanner translation rules –% –{KEYWORD} {list_token_type(check_reserved(yytext));} –{literal} {list_token_type(INTLITERAL) –{IDENTIFIER} {list_token_type(ID);} –{special_char}{list_token_type(check_special_char(yytext) );} –[\n] –[ \t] –{comment} –. {lexical_error(yytext);} 22

Micro scanner auxiliary procedures –% –main(int argc,char *argv[]) – { – if ((yyin=fopen(argv[1],"r"))==NULL) – { – printf("ERROR:file open error!!\n"); – exit(0); – } – yylex(); – fclose(yyin); –} 23

Micro scanner check_reserved – 確認是否為保留字 check_special_char – 確認是否為 +-*(),;:= list_token_type – 顯示 token 型態 24

Example begin read (A,B); A := (B+3)*A; B := A*3-B; write (A,B,3*4); end SCANEOF 25

Example BeGIN READ LPAREN ID COMMA ID RPAREN SEMICOLON ID ASSIGNOP LPAREN ID PLUSOP INTLITERAL RPAREN ID COMMA INTLITERAL MULTOP INTLITERAL RPAREN SEMICOLON END SCANEOF 26 MULTOP ID SEMICOLON ID ASSIGNOP ID MULTOP INTLITERAL MINUSOP ID SEMICOLON WRITE LPAREN ID COMMA