Professor Yihjia Tsai Tamkang University

Slides:



Advertisements
Similar presentations
Application: Yacc A parser generator A context-free grammar An LR parser Yacc Yacc input file:... definitions... %... production rules... %... user-defined.
Advertisements

1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
An Introduction to JLex/JavaCC
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1 Languages and Compilers (SProg og Oversættere) Parsing.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Compiler Construction1 COMP Compiler Construction Lecturer: Dr. Arthur Cater Teaching Assistant:
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
CPS 506 Comparative Programming Languages Syntax Specification.
Compiler design Lecture 1: Compiler Overview Sulaimany University 2 Oct
Looking ahead in javacc 2/28/06. 2 What’s LOOKAHEAD? The job of a parser is to read an input stream and determine whether or not the input stream is in.
Comp 311 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright August 28, 2009.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.
. n COMPILERS n n AND n n INTERPRETERS. -Compilers nA compiler is a program thatt reads a program written in one language - the source language- and translates.
Introduction to Compiling
Compiler Construction Sohail Aslam Lecture 9. 2 DFA Minimization  The generated DFA may have a large number of states.  Hopcroft’s algorithm: minimizes.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
The Role of Lexical Analyzer
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
2016/7/9Page 1 Lecture 11: Semester Review COMP3100 Dept. Computer Science and Technology United International College.
Comp 411 Principles of Programming Languages Lecture 3 Parsing
CS 3304 Comparative Languages
Lecture 9 Symbol Table and Attributed Grammars
Compiler Design (40-414) Main Text Book:
Constructing Precedence Table
Programming Languages Translator
Chapter 2 :: Programming Language Syntax
PROGRAMMING LANGUAGES
Compiler Lecture 1 CS510.
Bison: Parser Generator
CS416 Compiler Design lec00-outline September 19, 2018
Compiler Design 4. Language Grammars
Introduction CI612 Compiler Design CI612 Compiler Design.
CPSC 388 – Compiler Design and Construction
Lexical Analysis - An Introduction
Lecture 5: Lexical Analysis III: The final bits
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Automating Scanner Construction
CS416 Compiler Design lec00-outline February 23, 2019
Lexical Analysis - An Introduction
Chapter 10: Compilers and Language Translation
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Review for the Midterm. Overview (Chapter 1):
Compiler Design 3. Lexical Analyzer, Flex
Faculty of Computer Science and Information System
Presentation transcript:

Professor Yihjia Tsai Tamkang University Using JavaCC Professor Yihjia Tsai Tamkang University

Automating Lexical Analysis Overall picture Scanner generator NFA RE Java scanner program String stream DFA Minimize DFA Simulate DFA Tokens

Building Faster Scanners from the DFA Table-driven recognizers waste a lot of effort Read (& classify) the next character Find the next state Assign to the state variable Branch back to the top We can do better Encode state & actions in the code Do transition tests locally Generate ugly, spaghetti-like code (it is OK, this is automatically generated code) Takes (many) fewer operations per input character state = s0 ; string = ; char = get_next_char(); while (char != eof) { state = (state,char); string = string + char; char = get_next_char(); } if (state in Final) then report acceptance; else report failure;

Inside lexical analyzer generator How does a lexical analyzer work? Get input from user who defines tokens in the form that is equivalent to regular grammar Turn the regular grammar into a NFA Convert the NFA into DFA Generate the code that simulates the DFA

Flow for Using JavaCC Extracted from http://www.cs.unb.ca/profs/nickerson/courses/cs4905/Labs/L1_2006.pdf

Structure of a JavaCC File A JavaCC file is composed of 3 portions: Options Class declaration Specification for lexical analysis (tokens), and specification for syntax analysis. For the very first example of JavaCC, let's recognize two tokens: ``+'', and numerals. Use an editor to edit and save it with file name numeral.jj Focus of this Lecture

Using javaCC for lexical analysis javacc is a “top-down” parser generator. Some parser generators (such as yacc , bison, and JavaCUP) need a separate lexical-analyzer generator. With javaCC, you can specify the tokens within the parser generator.

Example File /* main class definition */ PARSER_BEGIN(Numeral) public class Numeral{ public static void main(String[] args) throws ParseException, TokenMgrError { Numeral numeral = new Numeral(System.in); while (numeral.getNextToken().kind!=EOF); } PARSER_END(Numeral) /* token definitions */ TOKEN: { <ADD: "+"> | <NUMERAL: (["0"-"9"])+>

Options The options portion is optional and is omitted in the previous example. STATIC is a boolean option whose default value is true. If true, all methods and class variables are specified as static in the generated parser and token manager. This allows only one parser object to be present, but it improves the performance of the parser. To perform multiple parses during one run of your Java program, you will have to call the ReInit() method to reinitialize your parser if it is static. If the parser is non-static, you may use the "new" operator to construct as many parsers as you wish. These can all be used simultaneously from different threads.

Start /* main class definition */ PARSER_BEGIN(Numeral) public class Numeral{ public static void main(String[] args) throws ParseException, TokenMgrError { Numeral numeral = new Numeral(System.in); while (numeral.getNextToken().kind!=EOF); } PARSER_END(Numeral) /* token definitions */ TOKEN: { <ADD: "+"> | <NUMERAL: (["0"-"9"])+> Simple Loop Getting Tokens

Compilation After calling javacc to compile numeral.jj, eight files are generated if no error messages occur. They are Numeral.java, NumberalConstants.java, NumeralTokenManger.java, ParseException.java, SimpleCharStream.java, Token.java, and TokenMgrError.java. bash-2.05$ javacc numeral.jj Java Compiler Compiler Version 3.2 (Parser Generator) (type "javacc" with no arguments for help) Reading from file numeral.jj . . . File "TokenMgrError.java" does not exist. Will create one. File "ParseException.java" does not exist. Will create one. File "Token.java" does not exist. Will create one. File "SimpleCharStream.java" does not exist. Will create one. Parser generated successfully

javaCC specification of a lexer Note the need for ( )! Defining Whitespace

A Full Example See the sample file

Dealing with errors Error reporting: 123e+q Could consider it an invalid token (lexical error) or return a sequence of valid tokens 123, e, +, q, and let the parser deal with the error.

Lexical error correction? Sometimes interaction between the Scanner and parser can help especially in a top-down (predictive) parse The parser, when it calls the scanner, can pass as an argument the set of allowable tokens. Suppose the Scanner sees calss in a context where only a top-level definition is allowed. Not too hard to guess what is meant. Scanner can guess that class was intended, generate a warning message, and return to parsing. Why should a compiler halt if it can figure out, with high probability, what was intended? Most lexical errors are character insertion, deletion, replacement, or transposition. PLC compiler from mid 1970’s did this well.

Same symbol, different meaning. How can the scanner distinguish between binary minus and unary minus? x = -a; vs x = 3 – a; It can’t. It has to simply pass MINUS back to the parser and let it distinguish.

Scanner “troublemakers” Unclosed strings Unclosed comments.

JavaCC as a Parsing Tool

Javacc Overview Generates a top down parser. Could be used for generating a Prolog parser which is in LL. Generates a parser in Java. Hence can be integrated with any Java based Prolog compiler/interpreter to continue our example. Token specification and grammar specification structures are in the same file => easier to debug.

Types of Productions in Javacc There can be four different kinds of Productions. Javacode For something that is not context free or is difficult to write a grammar for. eg) recognizing matching braces and error processing. Regular Expressions Used to describe the tokens (terminals) of the grammar. BNF Standard way of specifying the productions of the grammar. Token Manager Declarations The declarations and statements are written into the generated Token Manager (lexer) and are accessible from within lexical actions.

Javacc Look-ahead mechanism Exploration of tokens further ahead in the input stream. Backtracking is unacceptable due to performance hit. By default Javacc has 1 token look-ahead. Could specify any number for look-ahead. Two types of look-ahead mechanisms Syntactic A particular token is looked ahead in the input stream. Semantic Any arbitrary Boolean expression can be specified as a look-ahead parameter. eg) A -> aBc and B -> b ( c )? Valid strings: “abc” and “abcc”

References Compilers Principles, Techniques and Tools, Aho, Sethi, and Ullman