Winter 2006-2007 Compiler Construction T2 – Lexical Analysis (Scanning) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University.

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
Winter Compiler Construction T7 – semantic analysis part II type-checking Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Compilation Encapsulation Or: Why Every Component Should Just Do Its Damn Job.
Mooly Sagiv and Roman Manevich School of Computer Science
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
An Introduction to JLex/JavaCC
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
Scanning with Jflex.
Lecture 2: Lexical Analysis CS 540 George Mason University.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
1 Material taught in lecture Scanner specification language: regular expressions Scanner generation using automata theory + extra book-keeping.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.
CS 536 Spring Learning the Tools: JLex Lecture 6.
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Compilation Lecture 2: Lexical Analysis Syntax Analysis (1): CFLs, CFGs, PDAs Noam Rinetzky 1.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Automated Parser Generation (via CUP)CUP 1. High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
The Java Programming Language
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Using CookCC.  Use *.l and *.y files.  Proprietary file format  Poor IDE support  Do not work well for some languages.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
CSE 5317/4305 L2: Lexical Analysis1 Lexical Analysis Leonidas Fegaras.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
Compiler Construction Dr. Noam Rinetzky and Orr Tamir School of Computer Science Tel Aviv University
Jianguo Lu 1 Explanation on Assignment 2, part 1 DFASimulator.java –The algorithm is on page 116, dragon book. DFAInput.txt The Language is: 1* 0 1* 0.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Introduction to Lex Ying-Hung Jiang
Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Tutorial On Lex & Yacc.
CSc 453 Lexical Analysis (Scanning)
Lecture 2: Lexical Analysis Noam Rinetzky
Fall Compiler Principles Lecture 4: Parsing part 3
Compiler Design 3. Lexical Analyzer, Flex
Presentation transcript:

Winter Compiler Construction T2 – Lexical Analysis (Scanning) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

2 Material taught in lecture Scanner specification language: regular expressions Scanner generation using automata theory + extra book-keeping Basic Java tutorial

3 Goals: Quick review of lexical analysis theory Implementing a scanner for IC via JFlex Explain PA1 Today IC Language ic Executable code exe Lexical Analysis Syntax Analysis Parsing ASTSymbol Table etc. Inter. Rep. (IR) Code Generation

4 Scanning IC programs class Quicksort { int[] a; int partition(int low, int high) { int pivot = a[low];... } 1: CLASS 1: CLASS_ID(Quicksort) 1: LCBR 2: INT 2: LB 2: RB 2: ID(a)... IC program text tokens LINE: ID(VALUE)

5 Scanner implementation What are the outputs on the following inputs: ifelse if a

6 Lexical analysis with JFlex JFlex – fast lexical analyzer generator Recognizes lexical patterns in text Breaks input character stream into tokens Input: scanner specification file Output: a lexical analyzer (scanner) A Java program JFlexjavac IC.lex Lexical analyzer text tokens Lexer.java

7 JFlex spec. file User code Copied directly to Java file JFlex directives Define macros, state names Lexical analysis rules Optional state, regular expression, action How to break input to tokens Action when token matched % Possible source of javac errors down the road DIGIT= [0-9] LETTER= [a-zA-Z] YYINITIAL {LETTER} ({LETTER}|{DIGIT})*

8 User code package IC.Lexer; import IC.Parser.Symbol; … any scanner-helper Java code …

9 JFlex directives Directives - control JFlex internals  %line switches line counting on  %char switches character counting on  %class class-name  %cup CUP compatibility mode  %type token-class-name  %public Makes generated class public (package by default)  %function read-token-method  %scanerror exception-type-name State definitions %state state-name Macro definitions macro-name = regex

10 Regular expressions r $r $match reg. exp. r at end of a line. (dot)any character except the newline "..."verbatim string {name}macro expansion *zero or more repetitions +one or more repetitions ?zero or one repetitions (...)grouping within regular expressions a|ba|bmatch a or b [...] class of characters - any one character enclosed in brackets a–ba–brange of characters [^…]negated class – any one not enclosed in brackets

11 Example macros ALPHA=[A-Za-z_] DIGIT=[0-9] ALPHA_NUMERIC={ALPHA}|{DIGIT} IDENT={ALPHA}({ALPHA_NUMERIC})* NUMBER=({DIGIT})+ WHITE_SPACE=([\ \n\r\t\f])+

12 Lexical analysis rules Rule structure [states] regexp {action as Java code} regexp pattern - how to break input into tokens Action invoked when pattern matched Priority for rule matching longest string More than one match for same length – priority for rule appearing first! Important: rules given in a JFlex specification should match all possible inputs!

13 Action body Java code Can use special methods and vars yytext() yyline (when enabled) … Scanner state transition yybegin(state-name) YYINITIAL

14 Independent mini-scanner inside scanner comments /* */ quote string “ ” Scan syntactically different portion of the input Tokenize according to context Example “if” is a keyword token when in program text “if” is part of comment text when inside a comment More on scanner states // this conditon checks if x > y if (x>y) {… }

15 {NUMBER} { return new Symbol(sym.NUMBER, yytext(), yyline)); } {WHITE_SPACE} { } "+" { return new Symbol(sym.PLUS, yytext(), yyline); } "-" { return new Symbol(sym.MINUS, yytext(), yyline); } "*" { return new Symbol(sym.TIMES, yytext(), yyline); }... "//" { yybegin(COMMENTS); } [^\n] { } [\n] { yybegin(YYINITIAL); }. { return new Symbol(sym.error, null); }

16 import java_cup.runtime.Symbol; % %cup %{ private int lineCounter = 0; %} %eofval{ System.out.println("line number=" + lineCounter); return new Symbol(sym.EOF); %eofval} NEWLINE=\n % {NEWLINE} { lineCounter++; } [^{NEWLINE}] { } lineCount.lex Putting it all together – count number of lines

17 JFlex javac lineCount.lex Lexical analyzer text tokens lineCount.java java JFlex.Main lineCount.lex javac *.java Main.java JFlex and JavaCup must be on CLASSPATH sym.java Putting it all together – count number of lines

18 Running the scanner import java.io.*; public class Main { public static void main(String[] args) { Symbol currToken; try { FileReader txtFile = new FileReader(args[0]); Yylex scanner = new Yylex(txtFile); do { currToken = scanner.next_token(); // do something with currToken } while (currToken.sym != sym.EOF); } catch (Exception e) { throw new RuntimeException("IO Error (brutal exit)” + e.toString()); } (Just for testing scanner as stand-alone program)

19 Common pitfalls Classpath Path to executable Define environment variables JAVA_HOME CLASSPATH May want to include current directory ‘.’ Note the use of. (dot) as part of package name / directory structure e.g., JFlex.Main

20 Programming assignment 1 Implement a scanner for IC class Token At least – line, id, value Should extend java_cup.runtime.Symbol Numeric token ids in sym.java Will be later generated by JavaCup class Compiler Testbed - calls scanner to print list of tokens class LexicalError Caught by Compiler Don’t forget to generate scanner and recompile Java sources when you change the spec. You need to download and install both JFlex and JavaCup

21 sym.java file public class sym { public static final int EOF = 0;... }  Defines symbol constant ids  Tells parser what is the token returned by scanner  Actual value doesn’t matter  But different tokens should have different values  In the future will be generated by JavaCup

22 Token class import java_cup.runtime.Symbol; public class Token extends Symbol { public int getId() {...} public Object getValue() {...} public int getLine() {...}... }

23 %cup (integrate with cup) %line (count lines) %type Token (pass type Token) %class Lexer (gen. scanner class) (some) JFlex directives to use

24 Structure of PA1 JFlex javac IC.lex Lexical analyzer test.ic tokens Lexer.java sym.java sym.java Token.java LexicalError.java Compiler.java

25 Beginning the assignment Download J2SE 1.5 Make sure you can compile and run programs Download JFlex Download JavaCup Use of Eclipse is recommended Use of Apache Ant is recommended JFlex and JavaCup must be in the CLASSPATH Use assignment skeleton: pa1.zip to avoid unnecessary mistakespa1.zip

26 Administrative issues Check your details in the list of project teams If you don’t have a team, get one by PA deadline PA1 Electronic submission due Nov. 15, midnight Printout of the README due Nov. 15, noon mailbox 268 Schreiber Carefully READ the material Use FORUM

27 See you next week