Scanning with Jflex.

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Lexical Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 2.
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
An Introduction to JLex/JavaCC
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Lecture 2: Lexical Analysis CS 540 George Mason University.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
1 Material taught in lecture Scanner specification language: regular expressions Scanner generation using automata theory + extra book-keeping.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.
CS 536 Spring Learning the Tools: JLex Lecture 6.
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Compilation Lecture 2: Lexical Analysis Syntax Analysis (1): CFLs, CFGs, PDAs Noam Rinetzky 1.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Winter Compiler Construction T2 – Lexical Analysis (Scanning) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lecture 2: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
Compilation (Semester A, 2013/14) Lecture 2: Lexical Analysis Modern Compiler Design: Chapter 2.1 Noam Rinetzky 1.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
CSE 5317/4305 L2: Lexical Analysis1 Lexical Analysis Leonidas Fegaras.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Jianguo Lu 1 Explanation on Assignment 2, part 1 DFASimulator.java –The algorithm is on page 116, dragon book. DFAInput.txt The Language is: 1* 0 1* 0.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Introduction to Lex Ying-Hung Jiang
Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
CSc 453 Lexical Analysis (Scanning)
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Compiler Construction Sohail Aslam Lecture 9. 2 DFA Minimization  The generated DFA may have a large number of states.  Hopcroft’s algorithm: minimizes.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
LEX SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Sung-Dong Kim, School of Computer Engineering, Hansung University
Tutorial On Lex & Yacc.
Languages and Compilers (SProg og Oversættere)
Lecture 2: Lexical Analysis Noam Rinetzky
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University
Regular Languages.
Regular Expressions and Lexical Analysis
Presentation transcript:

Scanning with Jflex

Material taught in lecture Scanner specification language: regular expressions Scanner generation using automata theory + extra book-keeping Scanner Parser Semantic Analysis Code Generation

Scanning Scheme programs tokens LINE: ID(VALUE) Scheme program text L_PAREN SYMBOL(define) SYMBOL(foo) L_PAREN SYMBOL(lambda) SYMBOL(x) R_PAREN ... (define foo (lambda (x) (+ x 14)))

Scanner implementation What are the outputs on the following inputs: ifelse if a .75 89 89.94

Lexical analysis with JFlex JFlex – fast lexical analyzer generator Recognizes lexical patterns in text Breaks input character stream into tokens Input: scanner specification file Output: a lexical analyzer (scanner) A Java program text Scheme.lex Lexer.java Lexical analyzer JFlex javac tokens

Possible source of javac errors down the road JFlex spec. file Possible source of javac errors down the road User code Copied directly to Java file %% DIGIT= [0-9] LETTER= [a-zA-Z] YYINITIAL JFlex directives Define macros, state names %% Lexical analysis rules Optional state, regular expression, action How to break input to tokens Action when token matched {LETTER} ({LETTER}|{DIGIT})*

User code package Scheme.Parser; import Scheme.Parser.Symbol; … any scanner-helper Java code

JFlex directives Directives - control JFlex internals %line switches line counting on %char switches character counting on %cup CUP compatibility mode %class class-name changes default name %type token-class-name %public Makes generated class public (package by default) %function read-token-method %scanerror exception-type-name State definitions %state state-name Macro definitions macro-name = regex

Regular expressions r $ match reg. exp. r at end of a line . (dot) any character except the newline "..." verbatim string {name} macro expansion * zero or more repetitions + one or more repetitions ? zero or one repetitions (...) grouping within regular expressions a|b match a or b [...] class of characters - any one character enclosed in brackets a–b range of characters [^…] negated class – any one not enclosed in brackets - If the dollar sign ends a regular expression, the expression is matched only at the end of a line.

Partway example import java_cup.runtime.Symbol; %% %cup %line %char %state STRING ALPHA=[A-Za-z_] DIGIT=[0-9] ALPHA_NUMERIC={ALPHA}|{DIGIT} IDENT={ALPHA}({ALPHA_NUMERIC})* NUMBER=({DIGIT})+ WHITE_SPACE=([\ \n\r\t\f])+ %{ private int lineCounter = 0; %} …

Scanner states example YYINITIAL STRING Regular Expression 1-> do 1 Regular Expression 2-> do 2 Regular Expression 3-> do 3 Regular Expression 4-> do 4 Regular Expression 1-> do 1 Regular Expression 2-> do 2 Regular Expression 3-> do 3 Regular Expression 4-> do 4 \” \” // \n Regular Expression 1-> do 1 Regular Expression 2-> do 2 Regular Expression 3-> do 3 Regular Expression 4-> do 4

Lexical analysis rules Rule structure [states] regexp {action as Java code} regexp pattern - how to break input into tokens Action invoked when pattern matched Priority for rule matching longest string. This can be either good or bad, depending on context. /** @Javadoc */ Class A{… /*end*/ Int a = 1000000000000

More than one match for same length – priority for rule appearing first! Example: ‘if’ matches identifiers and the reserved word Order leads to different automata Important: rules given in a JFlex specification should match all possible inputs!

Action body Java code Can use special methods and vars yytext()– the actual token text yyline (when enabled) … Scanner state transition yybegin(state-name)– tells JFlex to jump to the given state YYINITIAL – name given by JFlex to initial state

Special class for capturing token information <YYINITIAL> {NUMBER} { return new Symbol(sym.NUMBER, yytext(), yyline)); } <YYINITIAL> {WHITE_SPACE} { } <YYINITIAL> "+" { return new Symbol(sym.PLUS, yytext(), yyline); <YYINITIAL> "-" { return new Symbol(sym.MINUS, yytext(), yyline); <YYINITIAL> "*" { return new Symbol(sym.TIMES, yytext(), yyline); ... <YYINITIAL> "//" { yybegin(COMMENTS); } <COMMENTS> [^\n] { } <COMMENTS> [\n] { yybegin(YYINITIAL); } <YYINITIAL> . { return new Symbol(sym.error, null); } Special class for capturing token information class Token defined elsewhere Note: state transitions YYINITIAL<->COMMENT default rule (with the dot), as the last rule !

Additional Example http://jflex.de/manual.html#SECTION00040000000000000000

Running the scanner (Just for testing scanner as stand-alone program) import java.io.*; public class Main { public static void main(String[] args) { Symbol currToken; try { FileReader txtFile = new FileReader(args[0]); Yylex scanner = new Yylex(txtFile); do { currToken = scanner.next_token(); // do something with currToken } while (currToken.sym != sym.EOF); } catch (Exception e) { throw new RuntimeException("IO Error (brutal exit)” + e.toString()); } (Just for testing scanner as stand-alone program)