JLex Lecture 4 Mon, Jan 26, 2004.

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

This Time Whitespace and Input/Output revisited The Programming cycle Boolean Operators The “if” control structure LAB –Write a program that takes an integer.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Compiler Baojian Hua Lexical Analysis (II) Compiler Baojian Hua
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
CS 536 Spring Learning the Tools: JLex Lecture 6.
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lecture 2: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Introduction to Lex Ying-Hung Jiang
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
Introduction to Lex Fan Wu
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
Compiler Construction Sohail Aslam Lecture 9. 2 DFA Minimization  The generated DFA may have a large number of states.  Hopcroft’s algorithm: minimizes.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Applications of Context-Free Grammars (CFG) Parsers. The YACC Parser-Generator. by: Saleh Al-shomrani.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
A Simple Java Program //This program prints Welcome to Java! public class Welcome { public static void main(String[] args) { public static void main(String[]
Ajmer Singh PGT(IP) Programming Fundamentals. Ajmer Singh PGT(IP) Java Character Set Character set is a set of valid characters that a language can recognize.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 Steps to use Flex Ravi Chotrani New York University Reviewed By Prof. Mohamed Zahran.
CS 614: Theory and Construction of Compilers Lecture 5 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Operating System Discussion Section. The Basics of C Reference: Lecture note 2 and 3 notes.html.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI Dale Roberts, Lecturer Computer Science, IUPUI
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CSC 4630 Meeting 7 February 7, 2007.
Programming what is C++
Chapter-01 A Sample C++ Program.
Lexical Analysis.
Chapter 3 Lexical Analysis.
Tutorial On Lex & Yacc.
Java Primer 1: Types, Classes and Operators
Using SLK and Flex++ Followed by a Demo
RegExps & DFAs CS 536.
Review: Compiler Phases:
Subject Name:Sysytem Software Subject Code: 10SCS52
Introduction to Classes and Objects
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
Compiler Construction
Appendix B.1 Lex Appendix B.1 -- Lex.
Compiler Lecture Note, Miscellaneous
Chapter 2 Primitive Data Types and Operations
Regular Expressions and Lexical Analysis
Systems Programming & Operating Systems Unit – III
Compiler Design 3. Lexical Analyzer, Flex
Lex Appendix B.1 -- Lex.
Presentation transcript:

JLex Lecture 4 Mon, Jan 26, 2004

JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator in C. JLex reads a description of a set of tokens and outputs a Java program that will process those tokens.

The JLex Input File The input file to JLex uses the extension .lex. The file is divided into three parts. User code JLex directives Regular expression rules These three sections are separated by %%.

JLex User Code See Section 2.1 of the JLex User’s Manual. Any code written in the user-code section is copied directly into the Java source file created by JLex. JLex creates a class named Yylex, which is at the heart of the lexer. The user code is not incorporated into this class.

JLex Directives See Section 2.2 of the JLex User’s Manual. Any code bracketed within %{ and %} is copied directly into the Yylex class, at the beginning. Although this code is incorporated into the Yylex class, it is not incorporated into any Yylex member function. Thus, we may define Yylex class variables or additional member functions.

The init Directive Code bracketed within %init{ and %init} is copied into the Yylex default constructor, which is called on by the other constructors. %init{ System.out.println("In the constructor"); %init}

The eof Directive Code bracketed within %eof{ and %eof} is copied into the Yylex function yy_do_eof(), which is called once upon end of file. %eof{ System.out.println("In yy_do_eof()"); %eof}

JLex Token Types Unless we specify otherwise, the data type of the returned tokens is Yytoken. This class is not created automatically. We may change the return type to int by typing the directive %integer. We may change the return type to Integer by typing the directive %intwrap. We may set the return type to any other type by using the directive %type.

JLex Token Types If the return type is Yytoken or Integer, then the EOF token is null. If the return type is int, then the EOF token is -1. For any other type, we need to specify the EOF value.

JLex EOF Value By using the %eofval directive, we may indicate what value to return upon EOF. We write %eofval{ return new type(value); %eofval}

JLex Regular Expression Rules Each regular expression rule consists of a regular expression followed by an associated action. The associated action is a segment of Java code, enclosed in braces { }. Typically, the action will be to return the appropriate token.

JLex Regular Expressions Regular expressions are expressed using ASCII characters (0 – 127). The following characters are metacharacters. ? * + | ( ) ^ $ . [ ] { } “ \ Metacharacters have special meaning; they do not represent themselves. All other characters represent themselves.

JLex Regular Expressions Let r and s be regular expressions. r? matches zero or one occurrences of r. r* matches zero or more occurrences of r. r+ matches one or more occurrences of r. r|s matches r or s. rs matches r concatenated with s.

JLex Regular Expressions Parentheses are used for grouping. ("+"|"-")? If a regular expression begins with ^, then it is matched only at the beginning of a line. If a regular expression ends with $, then it is matched only at the end of a line. The dot . matches any non-newline character.

JLex Regular Expressions Brackets [ ] match any single character listed within the brackets. [abc] matches a or b or c. [A-Za-z] matches any letter. If the first character after [ is ^, then the brackets match any character except those listed. [^A-Za-z] matches any nonletter.

JLex Regular Expressions A single character within double quotes " " represents itself. Metacharacters lose their special meaning and represent themselves when they stand alone within single quotes. "?" matches ?.

JLex Escape Sequences Some escape sequences. \n matches newline. \b matches backspace. \r matches carriage return. \t matches tab. \f matches formfeed. If c is not a special escape-sequence character, then \c matches c.

Running JLex The lexical analyzer generator is the Main class in the JLex folder. To create a lexical analyzer from the file filename.lex, type java JLex.Main filename.lex This produces a file filename.lex.java, which must be compiled to create the lexical analyzer.

Running the Lexical Analyzer To run the lexical analyzer, a Yylex object must first be created. The Yylex constructor has one parameter specifying a input stream. For example Yylex lexer = new Yylex(System.in); Then, calls to the yylex() member function will return tokens. token = lexer.yylex();