COMP261 2015 Parsing 3 of 4 Lectures 23. Using the Scanner Break input into tokens Use Scanner with delimiter: public void parse(String input ) { Scanner.

Slides:



Advertisements
Similar presentations
Lesson 6 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Advertisements

Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
CS252: Systems Programming Ninghui Li Topic 4: Regular Expressions and Lexical Analysis.
Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Context-Free Grammars Lecture 7
Introduction to ML - Part 2 Kenny Zhu. What is next? ML has a rich set of structured values Tuples: (17, true, “stuff”) Records: {name = “george”, age.
Fall 2007CS 2251 Miscellaneous Topics Deque Recursion and Grammars.
Slide 1 Chapter 2-b Syntax, Semantics. Slide 2 Syntax, Semantics - Definition The syntax of a programming language is the form of its expressions, statements.
CS 280 Data Structures Professor John Peterson. Lexer Project Questions? Must be in by Friday – solutions will be posted after class The next project.
28-Jun-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Adrian Ilie COMP 14 Introduction to Programming Adrian Ilie June 27, 2005.
Chapter 2 A Simple Compiler
1 AspectJ Assignment. 2 The simple language Tiny program y=2; x=2+y*3; z=x+y*2; –A program is a sequence of assignments; –Expressions on the right hand.
MIT Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
1 CSC 222: Computer Programming II Spring 2005 Stacks and recursion  stack ADT  push, pop, peek, empty, size  ArrayList-based implementation, java.util.Stack.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
CS 461 – Oct. 7 Applications of CFLs: Compiling Scanning vs. parsing Expression grammars –Associativity –Precedence Programming language (handout)
CS 280 Data Structures Professor John Peterson. How Does Parsing Work? You need to know where to start (“statement”) This grammar is constructed so that.
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
CS 153: Concepts of Compiler Design September 16 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CPS 506 Comparative Programming Languages Syntax Specification.
19-Dec-15 Tokenizers. Tokens A tokenizer is a program that extracts tokens from an input stream A token has two parts: Its value—this is just the characters.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
CS 46B: Introduction to Data Structures June 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
Bernd Fischer COMP2010: Compiler Engineering Abstract Syntax Trees.
Java Programming: From Problem Analysis to Program Design, Second Edition 1 Lecture 1 Objectives  Become familiar with the basic components of a Java.
1 Flow of Control Chapter 5. 2 Objectives You will be able to: Use the Java "if" statement to control flow of control within your program.  Use the Java.
Comp 411 Principles of Programming Languages Lecture 3 Parsing
Information and Computer Sciences University of Hawaii, Manoa
Chapter 3 – Describing Syntax
CS 153: Concepts of Compiler Design September 14 Class Meeting
Parsing 2 of 4: Scanner and Parsing
ASTs, Grammars, Parsing, Tree traversals
COMP261 Lecture 18 Parsing 3 of 4.
CS 3304 Comparative Languages
Introduction to Parsing
COMP261 Lecture 19 Parsing 4 of 4.
CS 432: Compiler Construction Lecture 10
Context-free Languages
ASTs, Grammars, Parsing, Tree traversals
Basic Program Analysis: AST
Compiler Design 4. Language Grammars
null, true, and false are also reserved.
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
An overview of Java, Data types and variables
CSE401 Introduction to Compiler Construction
Lecture 7: Introduction to Parsing (Syntax Analysis)
Programming Languages
R.Rajkumar Asst.Professor CSE
Syntax-Directed Translation
Interpreter Pattern.
Adapted from slides by Nicholas Shahan and Dan Grossman
Nicholas Shahan Spring 2016
The Recursive Descent Algorithm
6.001 SICP Interpretation Parts of an interpreter
Presentation transcript:

COMP Parsing 3 of 4 Lectures 23

Using the Scanner Break input into tokens Use Scanner with delimiter: public void parse(String input ) { Scanner s = new Scanner(input); s.useDelimiter("\\s*(?=[(),])|(?<=[(),])\\s*"); if ( parseExpr(s) ) { System.out.println("That is a valid expression"); } Breaks the input into a sequence of tokens, spaces are separator characters and not part of the tokens tokens also delimited at round brackets and commas which will be tokens in their own right.

Looking at next token Need to be able to look at the next token to work out which branch to take: –Scanner has two forms of hasNext: s.hasNext():  is there another token in the scanner? s.hasNext(“string to match”):  is there another token, and does it match the string? if ( s.hasNext(“add”) ) { ….. –Can use this to peek at the next token without reading it –String can be a regular expression! if ( s.hasNext(“[-+]?[0-9]+”) ) { ….. true if the next token is an integer –Good design for parser because the next token might be needed by another rule/method if it isn’t the right one for this rule/method.

Parsing Expressions (checking only) public boolean parseExpr(Scanner s) { if (s.hasNext(“[-+]?[0-9]+”)){ s.next(); return true; } if (s.hasNext(“add”)) { return parseAdd(s); } if (s.hasNext(“sub”)) { return parseSub(s); } if (s.hasNext(“mul”)) { return parseMul(s); } if (s.hasNext(“div”)) { return parseDiv(s); } return false; public boolean parseAdd(Scanner s) { if (s.hasNext(“add”)){ s.next(); } else { return false; } if (s.hasNext(“(”)){ s.next(); } else { return false; } if (!parseExpr(s)) { return false; } if (s.hasNext(“,”)){ s.next(); } else { return false; } if (!parseExpr(s)){ return false; } if (s.hasNext(“)”)){ s.next(); } else { return false; } return true; }

Parsing Expressions (checking only) public boolean parseSub(Scanner s) { if (s.hasNext(“sub”)){ s.next(); } else { return false; } if (s.hasNext(“(”)){ s.next(); } else { return false; } if (!parseExpr(s)) { return false; } if (s.hasNext(“,”)){ s.next(); } else { return false; } if (!parseExpr(s)){ return false; } if (s.hasNext(“)”)){ s.next(); } else { return false; } return true; } same for parseMul and parseDiv

Parsing Expressions (checking only) Alternative, given similarity of Add, Sub, Mul, Div : public boolean parseExpr(Scanner s) { if (s.hasNext(“[-+]?[0-9]+”)){ s.next(); return true; } if (!s.hasNext(“add|sub|mul|div”)) { return false; } s.next(); if (s.hasNext(“(”)){ s.next(); } else { return false; } if (!parseExpr(s)) { return false; } if (s.hasNext(“,”)){ s.next(); } else { return false; } if (!parseExpr(s)){ return false; } if (s.hasNext(“)”)){ s.next(); } else { return false; } return true; }

How do we construct a parse tree? Given our grammar: Expr ::= Num | Add | Sub | Mul | Div Add ::= “add” “(” Expr “,” Expr “)” Sub ::= “sub” “(“ Expr “,” Expr “)” Mul ::= “mul” “(” Expr “,” Expr “)” Div ::= “div” “(“ Expr “,” Expr “)” Num ::= an optional sign followed by a sequence of digits: [-+]?[0-9]+ And an expression: add(sub(10, -5), 45) First goal is a concrete parse tree:

How do we construct a parse tree? add(sub(10, -5), 45) Add add Expr,, ) ) 45 sub 10,, -5 ) ) ( ( ( ( Expr Sub Num Expr

Modifying parser to produce parse tree Need to have Node classes to represent the syntax tree –Expression Nodes contain a number or an Add/Sub/Mul/Div –Add, Sub, Mul, Div nodes –Number Nodes –Terminal Nodes for the terminal values just contain a string.

Need classes for nodes and leaves interface Node { } class ExprNode implements Node { final Node child; public ExprNode(Node ch){ child = ch; } public String toString() { return "[" + child + "]"; } } class NumNode implements Node { final int value; public NumNode(int v){ value = v; } public String toString() { return value + ""; } } class TerminalNode implements Node { final String value; public TerminalNode(String v){ value = v; } public String toString() { return value; } }

Need classes for nodes and leaves class AddNode implements Node { final ArrayList children; public AddNode(ArrayList chn){ children = chn; } public String toString() { String result = "["; for (Node n : children){ result += n.toString(); } return result + "]"; } class SubNode implements Node {...

Modifying parser to produce parse tree Make the parser throw an exception if there is an error –each method either returns a valid Node, or it throws an exception. –fail method throws exception, constructing message and context. public void fail(String errorMsg, Scanner s){ String msg = "Parse Error: " + errorMsg + "; for (int i=0; i<5 && s.hasNext(); i++){ msg += " " + s.next(); } throw new RuntimeException(msg); } ⇒ Parse Error: no 34 ), mul (

Modifying parser to produce parse tree public Node parseExpr(Scanner s) { if (!s.hasNext()) { fail("Empty expr",s); } Node child = null; if (s.hasNext("-?\\d+")){ child = parseNumNode(s);} else if (s.hasNext("add")) { child = parseAddNode(s); } else if (s.hasNext("sub")) { child = parseSubNode(s); } else if (s.hasNext("mul")) { child = parseMulNode(s); } else if (s.hasNext("div")) { child = parseDivNode(s); } else { fail("not an expression", s); } return new ExprNode(child); } public Node parseNumNode(Scanner s) { if (!s.hasNextInt()) { fail("not an integer", s); } return new NumNode(s.nextInt()); }

Modifying parser to produce parse tree public Node parseAddNode(Scanner s) { ArrayList children = new ArrayList (); if (!s.hasNext("add")) { fail("no ‘add’", s); } children.add(new TerminalNode(s.next())); if (!s.hasNext("(")){ fail("no ‘(’", s); } children.add(new TerminalNode(s.next())); children.add(parseExpr(s)); if (!s.hasNext(",")){ fail("no ‘,’", s); } children.add(new TerminalNode(s.next())); children.add(parseExpr(s)); if (!s.hasNext(")")){ fail("no ‘)’", s); } children.add(new TerminalNode(s.next())); return new ExprNode(children); }

What about abstract syntax trees? Do we need all the stuff in the concrete parse tree? An abstract syntax tree: Don't need –literal strings from rules –useless nodes Expr tokens under Num Add N 45 N 45 N 10 N 10 N -5 N -5 Sub Add add Expr,, ) ) 45 sub 10,, -5 ) ) ( ( ( ( Expr Sub Num Expr

Simplify the node classes interface Node {} class AddNode implements Node { private Node left, right; public AddNode(Node lt, Node rt) { left = lt; right = rt; } public String toString(){return "add("+left+","+right+")";} } class SubNode implements Node { private Node left, right; public SubNode(Node lt, Node rt) { left = lt; right = rt; } public String toString(){return "sub("+left+","+right+")";} } class MulNode implements Node {... Only need the two arguments

Numbers stay the same class NumberNode implements Node { private int value; public NumberNode(int value) { this.value = value; } public String toString(){return ""+value;} } public Node parseNumber(Scanner s){ if (!s.hasNext("[-+]?\\d+")){ fail("Expecting a number",s); } return new NumberNode(s.nextInt(t)); }

ParseExpr is simpler Don't need to create an Expr node that contains a node: –Just return the node! public Node parseExpr(Scanner s){ if (s.hasNext("-?\\d+")){ return parseNumber(s); } if (s.hasNext("add")) { return parseAdd(s); } if (s.hasNext("sub")) { return parseSub(s); } if (s.hasNext("mul")) { return parseMul(s); } if (s.hasNext("div")) { return parseDiv(s); } fail("Unknown or missing expr",s); return null; }

parseAdd etc are simpler: Don't need so many children: public Node parseAdd(Scanner s) { Node left, right; if (s.hasNext("add")){ s.next(); } else { fail("Expecting add",s); } if (s.hasNext("(")){ s.next(); } else { fail("Missing '('",s); } left = parseExpr(s); if (s.hasNext(",")){ s.next(); } else { fail("Missing ','",s); } right = parseExpr(s); if (s.hasNext(")")){ s.next(); } else { fail("Missing ')'",s); } return new AddNode(left, right); } Good error messages will help you debug your parser Highly repetitive structure!!

Making parseAdd etc even simpler public Node parseAdd(Scanner s) { Node left, right; require("add", "Expecting add", s); require("(", "Missing '('", s); left = parseExpr(s); require(",", "Missing ','", s); right = parseExpr(s); require(")", "Missing ')'", s); return new AddNode(left, right); } // consumes (and returns) next token if it matches pat, reports error if not public String require(String pat, String msg, Scanner s){ if (s.hasNext(pat)) {return s.next(); } else { fail(msg, s); return null;} }

What can we do with an AST? We can "execute" parse trees in AST form interface Node { public int evaluate(); } class NumberNode implements Node{... publicint evaluate() { return this.value; } } class AddNode implements Node{... public int evaluate() { return left.evaluate() + right.evaluate(); }... } Recursive DFS evaluation of expression tree

What can we do with AST? We can print expressions in other forms class AddNode implements Node { private Node left, right; public AddNode(Node lt, Node rt) { left = lt; right = rt; } public int evaluate() { return left.evaluate() + right.evaluate(); } public String toString(){ return "(" + left + " + " + right + ")"; } Prints in regular infix notation (with brackets)

Nicer Language Allow floating point numbers as well as integers –need more complex pattern for numbers. class NumberNode implements Node { final double value; public NumberNode(double v){ value= v; } public String toString(){ return String.format("%.5f", value); } public double evaluate(){ return value; } }

Nicer Language Extend the language to allow 2 or more arguments: Expr ::= Num | Add | Sub | Mul | Div Add ::= “add” “(” Expr [ “,” Expr ]+ “)” Sub ::= “sub” “(“ Expr [ “,” Expr ]+ “)” Mul ::= “mul” “(” Expr [ “,” Expr ]+ “)” Div ::= “div” “(“ Expr [ “,” Expr ]+ “)” “add(45, 16, sub(10, 5, 1), 34)” Add Sub Num 10 Num 10 Num 5 Num 5 Num 45 Num 45 Num 16 Num 16 Num 34 Num 34 Num 1 Num 1 sub(16, 8, 2, 1) = 16 – 8 – 2 – 1

Node Classes class NumberNode implements Node { final double value; public NumberNode(double v){ value= v; } public String toString(){ return String.format("%.5f", value); } public double evaluate(){ return value; } }

Node Classes class AddNode implements Node { final List args; public AddNode(List nds){ args = nds; } public String toString(){ String ans = "(" + args.get(0); for (int i=1;i<args.size(); i++){ ans += " + "+ args.get(i); } return ans + ")"; } public double evaluate(){ double ans = 0; for (nd : args) { ans += nd.evaluate(); } return ans; }