CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.

Slides:



Advertisements
Similar presentations
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Advertisements

176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Scripting Languages Chapter 8 More About Regular Expressions.
Scanning with Jflex.
Lecture 2: Lexical Analysis CS 540 George Mason University.
1 Material taught in lecture Scanner specification language: regular expressions Scanner generation using automata theory + extra book-keeping.
CPSC 388 – Compiler Design and Construction
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
CS 536 Spring Learning the Tools: JLex Lecture 6.
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Last Updated March 2006 Slide 1 Regular Expressions.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lecture 2: Lexical Analysis
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Regular Expressions.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
Java™ How to Program, 10/e © Copyright by Pearson Education, Inc. All Rights Reserved.
Jianguo Lu 1 Explanation on Assignment 2, part 1 DFASimulator.java –The algorithm is on page 116, dragon book. DFAInput.txt The Language is: 1* 0 1* 0.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction to Lex Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
Unix Environment Input Output 2  List Content (ls) ◦ ls (list current directory) ◦ ls –all (include hidden files/folders)  Make directory (mkdir) ◦
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
What are Regular Expressions?What are Regular Expressions?  Pattern to match text  Consists of two parts, atoms and operators  Atoms specifies what.
CS 614: Theory and Construction of Compilers Lecture 5 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
CS 536 © CS 536 Spring Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
LEX SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
RegExps & DFAs CS 536.
JLex Lecture 4 Mon, Jan 26, 2004.
Review: Compiler Phases:
Subject Name:Sysytem Software Subject Code: 10SCS52
REGEX.
Systems Programming & Operating Systems Unit – III
Presentation transcript:

CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator

JLex Scanner Generator JLex.Main JLex specification xxx.jlex Scanner Source code xxx.jlex.java javac xxx.jlex.java Yylex.class java Yylex.class Main.class Output of Main

Yylex Class class Yylex { … public Yylex(java.io.Reader i)… public Yylex(java.io.InputStream i)… … public next_token()… … }

.jlex file User Code % JLex directives % Regular Expression rules Copied verbatim into.java file Includes macros and parameters For changing how JLex runs Explains how to divide up user Input into tokens

Regular Expression Rules Regular expression{ action } Pattern to be matched Java code to be executed When pattern is matched When the scanner's next_token() method is called, it repeats: 1.Find the longest sequence of characters in the input (starting with current character) that matches a pattern. 2.Perform the associated action.

next_token() method until an action causes the next_token() method to return. If there are several patterns that match the same (longest) sequence of characters, then the first such pattern is considered to be matched (so the order of the patterns can be important). If an input character is not matched in any pattern, the scanner throws an exception (so it is important to make sure that there can be no such unmatched characters, since it is not good to have a scanner that can "crash" on bad input). When the scanner's next_token() method is called, it repeats: 1.Find the longest sequence of characters in the input (starting with current character) that matches a pattern. 2.Perform the associated action.

Regular Expressions  Most Characters match themselves abc == While  Characters in “” match themselves (even special characters) “abc” “==“ “While” “a|b”

Regular Expressions  Special Characters | means "or“ * means zero or more instances of + means one or more instances of ? means zero or one instance of ( ) are used for grouping. means any character except newline \ means special escape character follows (\n, \t, \”, \\, etc.) ^ means only match at beginning of a line $ means only match at end of a line

Character Classes  Characters in square brackets [] create a character class  [a-z] matches any single lower-case character  Special Characters in Character Classes: - means range ^ means negate the class (if at beginning) \ escape character (same as before) “” means match actual character (similar to \)  Example Character Classes: [a b] matches “a”, “ “, or “b” [a”-”z] matches “a”, “-”, or “z” [a\n] matches “a” or newline (\n) [^a-z] matches anything except lowercase characters  Whitespace outside character classes terminates the regular expression

.jlex file User Code % JLex directives % Regular Expression rules Copied verbatim into.java file Includes macros and parameters For changing how JLex runs Explains how to divide up user Input into tokens

JLex Directives  Directives include specifying the value that should be returned on end-of-file specifying that line counting should be turned on specifying that the scanner will be used with the Java parser generator java cup.  The directives part also includes macro definitions.

Macro Definitions The form of a macro definition is:  name = regular-expression  where name is any valid Java identifier, and regular- expression is any regular expression. DIGIT= [0-9] LETTER= [a-zA-Z] WHITESPACE= [ \t\n]  Using Macros in a regular expression In other macros (only use macros previous defined)  Example Usage – Use curly braces {} {LETTER}({LETTER}|{DIGIT})*

SymbolMeaning in REMeaning in Char Class (Matches with ) to group sub- expressions. Represents itself. )Matches with ( to group sub- expressions. Represents itself. [Begins a character class.Represents itself. ]Is illegal.Ends a character class. {Matches with } to delimit a macro name. }Matches with { to delimit a macro name. Represents itself or matches with { to delimit a macro name. "Matches with " to delimit strings (only \ is special within strings). Matches with " for a string of chars that belong to the char class. Only \" is special within the string. \Escapes special characters (n, t, etc). Also used unicode/hex/octal. Escapes characters that are special inside a character class..Matches any character except newlineMatches itself |OrMatches itself *Kleene ClosureMatches itself +One or more matchesMatches itself ?Zero or One matchesMatches itself ^Matches only at beginning of lineComplements chars in class $Matches only at end of lineMatches itself -Matches ifselfRange of characters

Set Up and Use JLex  Download JLex.jar and java_cup.jar and save in a directory, example “/home/hccs/software”  Setup CLASSPATH to include JLex.jar and java_cup.jar Edit the.bashrc file in your home directory, example home dir “/home/hccs” Add or change line setting up CLASSPATH variable, e.g. “export CLASSPATH=.:/home/hccs/software/JLex.jar:/home/hccs/sof tware/java_cup.jar” Close any terminal windows and open a new one  Create xxx.jlex file  Run “javac –jar JLex.Main xxx.jlex”, should get xxx.jlex.java  Run “javac xxx.jlex.java”, should get Yylex.class (the scanner)

More Info On JLex