Introduction to Lex Fan Wu

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lecture 2: Lexical Analysis CS 540 George Mason University.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CS 536 Spring Learning the Tools: JLex Lecture 6.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lecture 2: Lexical Analysis
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
CS308 Compiler Principles Introduction to Yacc Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Lexical Analysis – Part I EECS 483 – Lecture 2 University of Michigan Monday, September 11, 2006.
Introduction to Lex Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
CSc 453 Lexical Analysis (Scanning)
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
1 LEX & YACC Tutorial February 28, 2008 Tom St. John.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
LEX SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Sung-Dong Kim, School of Computer Engineering, Hansung University
Lecture 2 Lexical Analysis
Lexical Analysis.
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
CSc 453 Lexical Analysis (Scanning)
Using SLK and Flex++ Followed by a Demo
TDDD55- Compilers and Interpreters Lesson 2
JLex Lecture 4 Mon, Jan 26, 2004.
Introduction to C++ Programming
Review: Compiler Phases:
Subject Name:Sysytem Software Subject Code: 10SCS52
Compiler Structures 3. Lex Objectives , Semester 2,
Appendix B.1 Lex Appendix B.1 -- Lex.
More on flex.
Systems Programming & Operating Systems Unit – III
NFAs, scanners, and flex.
Compiler Design 3. Lexical Analyzer, Flex
Lexical Analysis - Scanner-Contd
Lex Appendix B.1 -- Lex.
Presentation transcript:

Introduction to Lex Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University

Overview Lex is a tool for creating lexical analyzers. Lex source is a table of regular expressions and corresponding program fragments. The recognition of the expressions is performed by a deterministic finite automaton. As each expression appears in the input, the corresponding fragment is executed.

Usage Paradigm of Lex The Lex compiler transforms lex.1 to a C program, in a file named lex.yy.c. The latter file is compiled by the C compiler into a file called a.out. The C-compiler output is a working lexical analyzer that can take a stream of input characters and produce a stream of tokens.

Structure of Lex Programs %{ C declarations and includes %} declarations %% translation rules user subroutines

Structure of Lex Programs Cont’d %{…%}: Anything within these brackets i s copied directly to the file lex.yy.c Declarations: variables, identifiers declared to stand for a constant, and regular definitions Translation rules: pattern descriptions and actions Pattern { Action} User subroutines: user-written codes Three parts are separated by %% Remark: The first part defines patterns, the third part defines actions, the second part puts together to express “If we see some pattern, then we do some action”.

Policy for None-translated Source Any source not intercepted by Lex is copied into the generated program. Any line which is not part of a Lex rule or action which begins with a blank or tab Anything included between lines containing only %{ and %} Anything after the second %% delimiter

Position of Copied Source source input prior to the first %% external to any function in the generated code after the first %% and prior to the second %% appropriate place for declarations in the function generated by Lex which contains the actions after the second %% after the Lex generated output

Default Rules and Actions The first and second part must exist, but may be empty, the third part and the second %% are optional. If the third part dose not contain a main(), a default main() will be linked. Unmatched patterns will perform a default action, which copys the input to the output

Simple Lex Source Examples A minimum Lex program: %% It only copies the input to the output unchanged. Deleting three spacing characters: [ \t\n]; Deleting all blanks and tabs at the ends of lines : [ \t]+$;

A Lex Source Example %{ /* * Example lex source file * This first section contains necessary * C declarations and includes * to use throughout the lex specifications. */ #include <stdio.h> %} bin_digit [01] %%

A Lex Source Example Cont’d {bin_digit}* { /* match all strings of 0's and 1's */ /* Print out message with matching text */ printf("BINARY: %s\n", yytext); } ([ab]*aa[ab]*bb[ab]*)|([ab]*bb[ab]*aa[ab]*) { /* match all strings over (a,b) containing aa and bb */ printf("AABB\n"); \n ; /* ignore newlines */

A Lex Source Example Cont’d %% /* * Now this is where you want your main program */ int main(int argc, char *argv[]) { * call yylex to use the generated lexer yylex(); * make sure everything was printed fflush(yyout); exit(0); }

Lex Regular Expressions Elementary Operations single characters except “ \ . $ ^ [ ] - ? * + | ( ) / { } % < > concatenation (putting characters together) alternation (a|b|c) [ab] == a|b [a-k] == a|b|c|...|i|j|k [a-z0-9] == any letter or digit [^a] == any character but a Kleene Closure (*) Positive Closure (+)

Lex Regular Expressions Cont’d Special Operations . matches any single character (except newline) “ and \ quote the part as text \t tab \n newline \b backspace \” double quote \\ \ ? the preceding is optional ab? == a|ab (ab)? == ab|ε

Lex Regular Expressions Cont’d Special Operations Cont’d ^ at the beginning of the line $ at the end of the line, same as /\n [^ ] anything except \"[^\"]*\" is a double quoted string {n,m} m through n occurrences a{1,3} is a or aa or aaa {definition} translation from definition / matches only if followed by right part of / 0/1 the 0 of 01 but not 02 or 03 or … ( ) grouping

Regular Definitions NAME REGUGLAR_EXPRSSION digs [0-9]+ integer {digs} plainreal {digs}"."{digs} expreal {digs}"."{digs}[Ee][+-]?{digs} real {plainreal}|{expreal} NAME must be a valid C identifier {NAME} is replaced by prior regular expression

Regular Definitions Cont’d The definitions can also contain variables and other declarations used by the Code generated by Lex. These usually go at the start of this section, marked by %{ at the beginning and %} at the end or the line which begins with a blank or tab . Includes usually go here. It is usually convenient to maintain a line counter so that error messages can be keyed to the lines in which the errors are found. %{ int linecount = 1; %}

Translation Rules Pattern <white spaces> { program statements } A null statement will ignore the input Four special options: The unmatched token is using a default action that ECHO from the input to the output | indicates that the action for this rule is from the action for the next rule REJECT means going to the next alternative BEGIN means entering a start condition

Transition Rule Example {real} {return FLOAT;} {newline} {linecounter++;} {integer} { printf("I found an integer\n"); return INTEGER; }

Ambiguous Source Rules When more than one expression can match the current input The longest match is preferred. Among rules which matched the same number of characters, the rule given first is preferred. To override the choice, use action REJECT she {s++; REJECT;} he {h++; REJECT;} .|\n ;

Multiple States Lex allows the user to explicitly declare multiple states (in Definitions section) %s COMMENT Default states is INITIAL or 0 Transition rules can be classified into different states, which will be matched depending on the state in BEGIN is used to change state <INITIAL>. {ECHO;} <INITIAL>”/*” {BEGIN COMMENT;} <COMMENT>. {;} <COMMENT>”*/” {BEGIN INITIAL;}

Lex Special Variables yytext -- a string containing the lexeme yyleng -- the length of the lexeme yyin -- the input stream pointer Example: {integer} { printf("I found an integer\n"); sscanf(yytext,"%d", &x); return INTEGER; }

Lex Library Function Calls yylex() default main() contains a return yylex(); yywarp() called by lexical analyzer if end of the input file default yywarp() always return 1 yyless(n) n characters in yytext are retained yymore() the next input expression recognized is to be appended to the end of this input yyless(n) 将当前标识的除了前n个字符之外的都回送给输入流. yymore() 告诉解析器下一次匹配的规则,满足的部分将会添加到当前yytext值得后面.

User Subroutines The actions associated with any given token are normally specified using statements in C. But occasionally the actions are complicated enough that it is better to describe them with a function call, and define the function elsewhere. Definitions of this sort go in the last section of the Lex input.

Example %{ int lengs[100]; }% %% [a-z]+ { lengs[yyleng]++; } . | \n ; yywrap() { int i; printf("Length No. words\n"); for(i=0; i<100; i++) if (lengs[i] > 0) printf("%5d%10d\n",i,lengs[i]); return(1); }

Using Yacc Together with Lex Yacc will call yylex() to get the token from the input so that each Lex rule should end with: return(token); where the appropriate token value is returned.

Resources FLex Manual: http://flex.sourceforge.net/manual/ Doug Brown, John Levine, and Tony Mason, “lex & yacc”, second edition, O'Reilly. Thomas Niemann, “A Compact Guide to Lex & Yacc”. Lex/Yacc Win32 port: http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html Parser Generator: www.bumblebeesoftware.com