Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.

Slides:



Advertisements
Similar presentations
SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES. SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module.
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Scanner 中正理工學院 電算中心副教授 許良全. Copyright © 1998 by LCH Compiler Design Overview of Scanning n The purpose of a scanner is to group input characters into.
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lecture 2: Lexical Analysis CS 540 George Mason University.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source program) – divides it into tokens.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CS 536 Spring Learning the Tools: JLex Lecture 6.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lecture 2: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Lexical Analysis – Part I EECS 483 – Lecture 2 University of Michigan Monday, September 11, 2006.
Introduction to Lex Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery.
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Scanner Introduction to Compilers 1 Scanner.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
1st Phase Lexical Analysis
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Lexical Analysis.
Scanner Scanner Introduction to Compilers.
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
Using SLK and Flex++ Followed by a Demo
TDDD55- Compilers and Interpreters Lesson 2
Review: Compiler Phases:
CS 3304 Comparative Languages
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Appendix B.1 Lex Appendix B.1 -- Lex.
More on flex.
Systems Programming & Operating Systems Unit – III
NFAs, scanners, and flex.
Lexical Analysis - Scanner-Contd
Lex Appendix B.1 -- Lex.
Presentation transcript:

Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty string. –For each, a is a regular expression denote {a}, the set containing the string a. Induction case: – r and s are regular expressions denoting the language (set) L(r ) and L(s ). Then »( r ) | ( s ) is a regular expression denoting L( r ) U L( s ) »( r ) ( s ) is a regular expression denoting L( r ) L ( s ) »( r )* is a regular expression denoting (L ( r )) *

Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) Lex source program {definition} % {rules} % {user subroutines} Rules: Each regular expression specifies a token. Default action for anything that is not matched: copy to the output Action: C source fragments specifying what to do when a token is recognized.

lex program examples: ex1.l and ex2.l –‘lex ex1.l’ produces the lex.yy.c file. –The int yylex() routine is the scanner that finds all the regular expressions specified. yylex() returns a non-zero value (usually token id) normally. yylex() returns 0 when end of file is reached. Need a drive to test the routine. –You need to have a yywrap() function in the lex file (return 1). Something to do with compiling multiple files.

Lex regular expression: contains text characters and operators. –Letters of alphabet and digits are always text characters. Regular expression integer matches the string “integer” –Operators: “\[]^-?.*+|()$/{}%<> When these characters happen in a regular expression, they have special meanings

–operators (characters that have special meanings): “\[]^-?.*+|()$/{}%<> ‘*’, ‘+’, ‘|’, ‘(‘,’)’ -- used in regular expression ‘ “ ‘ -- any character in between quote is a text character. –E.g.: “xyz++” == xyz”++” ‘\’ -- escape character, –To get the operators back: “xyz++” == ?? –To specify special characters: \40 == “ “ ‘[‘ and ‘]’ -- used to specify a set of characters –e.g: [a-z], [a-zA-Z], –Every character in it except ^, - and \ is a text character – [-+0-9], [\40-\176] ‘^’ -- not, used as the first character after the left bracket –E.g [^abc] -- everything except a, b or c. –[^a-zA-Z] -- ??

–operators (characters that have special meanings): “\[]^-?.*+|()$/{}%<> ‘.’ -- every character ‘?’ -- optional ab?c matches ‘ac’ or ‘abc’ ‘/’ -- used in character lookahead: –e.g. ab/cd -- matches ab only if it is followed by cd ‘{‘’}’ -- enclose a regular definition ‘%’ -- has special meaning in lex ‘$’ -- match the end of a line, ‘^’ -- match the beginning of a line –ab$ == ab/\n ‘ ’: start condidtion (more context sensitivity support, see the paper for details).

–Order of pattern matching: Always matches the longest pattern. When multiple patterns matches, use the first pattern. –To override, add “REJECT” in the action.... % Ab {printf(“rule 1\n”);} Abc {printf(“rule 2\n”);} {letter}{letter|digit}* {printf(“rule 3\n”);} % Input: Abc What happened when at ‘.*’ as a pattern?

–Manipulate the lexeme and/or the input stream: yytext -- a char pointer pointing to the matched string yyleng -- the length of the matched string I/O routines to manipulate the input stream: –input() -- get a character from the input character, return <=0 when reaching the end of the input stream, the character otherwise –unput( c ) -- put c back onto the input stream –Deal with comments: (/* ….. */ »“/*”.*”*/” ??? % … “/*” {char c1; c2 = input(); if (c2 <=0) {lex_error(“unfinished comment” …} else { c1 = c2; c2 = input(); while (((c1!=‘*’) || (c2 != ‘/’)) && (c2 > 0)) {c1 = c2; c2 = input();} if (c2 <= 0) {lex_error( ….) }

–Reporting errors: What kind of errors? Not too many. –Characters that cannot lead to a token –unended comments (can we do it in later phases?) –unended string constants. How to keep track of current position (which line, which column)? –Use to global variable for this: yyline, yycolumn %{ int yyline = 1, yycolumn = 1; %}... % [ \t\n]+ {/* do nothing*/} If {return (IFNumber);} “+” {return (PLUSNumber);} {letter}{letter|digit}* {yylval = idtable_insert(yytext); return(IDNumber);}... %

–Reporting errors: How to report an error character that cannot lead to a token? How to deal with unended commend? How to deal with unended string?

Dealing with identifiers, string constants. –Data structures: A string table that stores the lexeme value. To avoid inserting the same lexeme multiple times, we will maintain an id table that records all identifiers found. Id table will have pointer pointing to the string table. –Implementation of the id table: hash_table, link list, tree, … –The hash_table implementation in page cpn match lastij c p ‘\0’ n ‘\0’ m a t c h ‘\0’ l a s t ‘\0’ I ‘\0’ j ‘\0’

Some code piece for the id table: #define STRINGTABLELENGTH #define PRIME 997 struct HashItem { int index; struct HashItem *next; } struct HashItem *HashTable[PRIME]; char StringTable[STRINGTABLELENGTH]; int StringTableIndex=0; int HashFunction(char *s); /* copy from page 436 */ int HashInsert(char *s);

–Internal representation of String constants: Needs conversion for the special characters. “abc” ==> ‘a’’b’’c’’\0’ “abc\”def” ==> ‘a’’b’’c’”’d’’e’’f’’\0’ “abc\n” ==> ‘a’’b’’c’’\n’ –Recognizing constant strings with special characters Assuming string cannot pass line boundary. Use yymore() “[^”\n]* {char c; c = input(); if (c != ‘”’) error else if (yytext[yyleng-1] == ‘\\’) { unput( c ); yymore(); } else {/* find the whole string, normal process*/}

Put it all together Checkout token.l program.