14-Jul-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.

Slides:



Advertisements
Similar presentations
1 Control Structures (and user input). 2 Flow of Control The order statements are executed is called flow of control By default, statements in a method.
Advertisements

Honors Compilers An Introduction to Grammars Feb 12th 2002.
16-Jun-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
16-Jun-15 Recursion. 2 Definitions I A recursive definition is a definition in which the thing being defined occurs as part of its own definition Example:
17-Jun-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
Classes, methods, and conditional statements We’re past the basics. These are the roots.
Trees. Definition of a tree A tree is like a binary tree, except that a node may have any number of children –Depending on the needs of the program, the.
26-Jun-15 Recursive descent parsing. The Stack One easy way to do recursive descent parsing is to have each parse method take the tokens it needs, build.
Stacks. What is a stack? A stack is a Last In, First Out (LIFO) data structure Anything added to the stack goes on the “top” of the stack Anything removed.
27-Jun-15 Recursive descent parsing. The Stack One easy way to do recursive descent parsing is to have each parse method take the tokens it needs, build.
28-Jun-15 Recursion. 2 Definitions I A recursive definition is a definition in which the thing being defined occurs as part of its own definition Example:
28-Jun-15 Access to Names Namespaces, Scopes, Access privileges.
28-Jun-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
29-Jun-15 Recursion. 2 Definitions I A recursive definition is a definition in which the thing being defined occurs as part of its own definition Example:
14-Jul-15 Parser Hints. The Stack To turn a “Recognizer” into a “Parser,” we need the use of a Stack All boolean Recognizer methods should continue to.
CHAPTER 10 Recursion. 2 Recursive Thinking Recursion is a programming technique in which a method can call itself to solve a problem A recursive definition.
Arrays. A problem with simple variables One variable holds one value –The value may change over time, but at any given time, a variable holds a single.
COMP 14: Primitive Data and Objects May 24, 2000 Nick Vallidis.
While Loops and Do Loops. Suppose you wanted to repeat the same code over and over again? System.out.println(“text”); System.out.println(“text”); System.out.println(“text”);
Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating.
1 Week 4 Questions / Concerns Comments about Lab1 What’s due: Lab1 check off this week (see schedule) Homework #3 due Wednesday (Define grammar for your.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
The Java Programming Language
CS 280 Data Structures Professor John Peterson. How Does Parsing Work? You need to know where to start (“statement”) This grammar is constructed so that.
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Recursive descent parsing 12-Nov-15. Abstract Syntax Trees (ASTs) An AST is a way of representing a computer program It is abstract because it throws.
CMP-MX21: Lecture 4 Selections Steve Hordley. Overview 1. The if-else selection in JAVA 2. More useful JAVA operators 4. Other selection constructs in.
22-Nov-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
VARIABLES Programmes work by manipulating data placed in memory. The data can be numbers, text, objects, pointers to other memory areas, and more besides.
Abstract Classes and Interfaces 5-Dec-15. Abstract methods You can declare an object without defining it: Person p; Similarly, you can declare a method.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
5-Jan-16 Recursive descent parsing. Some notes on recursive descent The starter code that I gave you did not exactly fit the grammar that I gave you Both.
 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Programming Fundamentals. Topics to be covered Today Recursion Inline Functions Scope and Storage Class A simple class Constructor Destructor.
Top-Down Parsing.
CS2102: Lecture on Abstract Classes and Inheritance Kathi Fisler.
The single most important skill for a computer programmer is problem solving Problem solving means the ability to formulate problems, think creatively.
Session 7 Introduction to Inheritance. Accumulator Example a simple calculator app classes needed: –AdderApp - contains main –AddingFrame - GUI –CloseableFrame.
CSE 143 Lecture 13 Recursive Backtracking slides created by Ethan Apter
CMSC 330: Organization of Programming Languages Operational Semantics.
Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.
ADTS, GRAMMARS, PARSING, TREE TRAVERSALS Lecture 13 CS2110 – Spring
CS 2130 Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing Warning: The precedence table given for the Wff grammar is in error.
Parsing 2 of 4: Scanner and Parsing
Grammars and Parsing.
COMP261 Lecture 18 Parsing 3 of 4.
Some Eclipse shortcuts
Introduction to Computer Science / Procedural – 67130
Recognizers 13-Sep-18.
Recursion 12-Nov-18.
Recursive descent parsing
Recursion 2-Dec-18.
CISC101 Reminders Assn 3 due tomorrow, 7pm.
Recursive descent parsing
Recognizers 1-Jan-19.
Recognizers 16-Jan-19.
Recognizers 22-Feb-19.
The Recursive Descent Algorithm
Recursive descent parsing
Recursion 23-Apr-19.
Recursive descent parsing
Chapter 10: Compilers and Language Translation
Lecture 3 More on Flow Control, More on Functions,
Presentation transcript:

14-Jul-15 Recognizers

2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the language defined by the grammar A parser will try to build a tree corresponding to the string, according to the rules of the grammar Input stringRecognizer resultParser result * 4true *falseError

3 Building a recognizer One way of building a recognizer from a grammar is called recursive descent Recursive descent is pretty easy to implement, once you figure out the basic ideas Recursive descent is a great way to build a “quick and dirty” recognizer or parser Production-quality parsers use much more sophisticated and efficient techniques In the following slides, I’ll talk about how to do recursive descent, and give some examples in Java

4 Recognizing simple alternatives, I Consider the following BNF rule: ::= “+” | “-” That is, an add operator is a plus sign or a minus sign To recognize an add operator, we need to get the next token, and test whether it is one of these characters If it is a plus or a minus, we simply return true But what if it isn’t? We not only need to return false, but we also need to put the token back because it doesn’t belong to us, and some other grammar rule probably wants it We need a tokenizer that can take back characters We will make do with putting back only one token at a time

5 Creating a decorator class A Decorator class is a class that extends another class and add functionality In this case, StringTokenizer may do what we need, except be able to take back tokens We can decorate StringTokenizer and add the ability to take back tokens For simplicity, we’ll allow only a single token to be returned Our decorator class will need a little extra storage, and will need to override and extend some methods

6 PushbackTokenizer I public class PushbackTokenizer extends StringTokenizer { String pushedToken = null; // to hold returned token public PushbackTokenizer(String s) { super(s); // superclass has no default constructor } public void pushBack(String token) { // added method pushedToken = token; }

7 PushbackTokenizer II public boolean hasMoreTokens() { if (pushedToken != null) return true; else return super.hasMoreTokens(); } Notice how we not only overrode this method, but if we didn’t have anything special to do, we just let the superclass’s method handle it public String nextToken() { if (pushedToken == null) return super.nextToken(); // We only return a pushedToken once String result = pushedToken; pushedToken = null; return result; } Again, we just use the superclass’s method, we don’t reinvent it

8 Sample use of PushbackTokenizer public static void main(String[ ] args) { PushbackTokenizer pb = new PushbackTokenizer("This is too cool"); String token; System.out.print(pb.nextToken( ) + " "); // “This” System.out.print(pb.nextToken( ) + " "); // “is” System.out.print (token = pb.nextToken() + " "); // “too” pb.pushBack(token); // return “too” System.out.print(pb.nextToken( ) + " "); // get “too” again System.out.print(pb.nextToken( ) + " "); // “cool” } Output: This is too too cool Question: Why the extra space?

9 Recognizing simple alternatives, II Our rule is ::= “+” | “-” Our method for recognizing an (which we will simply call addOperator ) looks like this: public boolean addOperator() { Get the next token, call it t If t is a “+”, return true If t is a “-”, return true If t is anything else, put the token back return false }

10 Helper methods We could turn the preceding pseudocode directly into Java code But we will be writing a lot of very similar code......and it won’t be very readable code We should write some auxiliary or “helper” methods to hide some of the details for us First helper method: private boolean symbol(String expectedSymbol) Gets the next token and tests whether it matches the expectedSymbol If it matches, return true If it doesn’t match, put the symbol back and return false We’ll look more closely at this method in a moment

11 Recognizing simple alternatives, III Our rule is ::= “+” | “-” Our pseudocode is: public boolean addOperator() { Get the next token, call it t If t is a “+”, return true If t is a “-”, return true If t is anything else, put the token back return false } Thanks to our helper method, our actual Java code is: public boolean addOperator() { return symbol("+") || symbol("-"); }

12 Categories of tokens Tokens are always strings, but they come in a variety of kinds: Names: "limit", "y", "maxValue" Keywords: "if", "while", "instanceof" Numbers: "25", "3" Symbols: "+", "=", ";" Special: "\n", end_of_input Instead of treating tokens as simple strings, it’s convenient to create a Token class that holds both its string value and a constant telling what kind of token it is class Token { String value; int type;...and this class should define some constants to represent the various types: public static final int NAME = 1; public static final int SYMBOL = 2; etc. If you are using Java 5.0, this is what enum s were invented for!

13 Implementing symbol symbol gets a token, makes sure it’s a symbol, compares it to the desired value, possibly puts the token back, and returns true or false We will want to do something similar for numbers, names, end of lines, and maybe end of input It would be foolish to write and debug all of these separately Again, we should use an auxiliary method private boolean symbol(String expectedSymbol) { return nextTokenMatches(Token.SYMBOL, expectedSymbol); }

14 nextTokenMatches #1 The nextTokenMatches method should: Get a token Compare types and values Return true if the token is as expected Put the token back and return false if it doesn’t match private boolean nextTokenMatches(int type, String value) { Token t = tokenizer.next(); if (type == t.getType() && value.equals(t.getValue())) { return true; } else { tokenizer.pushBack(t); return false; } }

15 nextTokenMatches #2 The previous method is fine for symbols, but what if we only care about the type? For example, we want to get a number—any number We need to compare only type, not value private boolean nextTokenMatches(int type, String value) { Token t = tokenizer.next(); omit this parameter if (type == t.getType() && value.equals(t.getValue())) return true; else tokenizer.pushBack(t); omit this test return false; } The two versions of nextTokenMatches are difficult to combine and fairly small, so we won’t worry about the code duplication too much

16 addOperator reprise public boolean addOperator() { return symbol("+") || symbol("-"); } private boolean symbol(String expectedSymbol) { return nextTokenMatches(Token.SYMBOL, expectedSymbol); } private boolean nextTokenMatches(int type, String value) { Token t = tokenizer.next(); if (type == t.getType() && value.equals(t.getValue())) return true; else tokenizer.pushBack(t); return false; }

17 Sequences, I Suppose we want to recognize a grammar rule in which one thing follows another, for example, ::= “[” “]” The code for this would be fairly simple... public boolean emptyList() { return symbol("[") && symbol("]"); }...except for one thing... What happens if we get a “ [ ” and don’t get a “ ] ”? The above method won’t work—why not? Only the second call to symbol failed, and only one token gets pushed back

18 Sequences, II The grammar rule is ::= “[” “]” And the token string contains [ 5 ] Solution #1: Write a pushBack method that can keep track of more than one token at a time (say, in a Stack ) This will allow you to put the back both the “ [ ” and the “ 5 ” The code gets pretty messy You have to be very careful of the order in which you return tokens Solution #2: Call it an error You might be able to get away with this, depending on the grammar For example, for any reasonable grammar, ( ) is clearly an error Solution #3: Change the grammar Tricky, and may not be possible Solution #4: Combine rules See the next slide

19 Sequences, III Suppose the grammar really says ::= “[” “]” | “[” “]” Now your pseudocode should look something like this: public boolean list() { if first token is “[” { if second token is “]” return true else if second token is a number { if third token is “]” return true else error } else put back first token } Revised grammar: ::= “[” ::= “]” | “]”

20 Simple sequences in Java Suppose you have this rule: ::= “(” “)” A good way to do this is often to test whether the grammar rule is not met public boolean factor() { if (symbol("(")) { if (!expression()) error("Error in parenthesized expression"); if (!symbol(")")) error("Unclosed parenthetical expression"); return true; } return false; }

21 Sequences and alternatives Here’s the real grammar rule for : ::= | | “(” “)” And here’s the actual code: public boolean factor() { if (name()) return true; if (number()) return true; if (symbol("(")) { if (!expression()) error("Error in parenthesized expression"); if (!symbol(")")) error("Unclosed parenthetical expression"); return true; } return false; }

22 Recursion, I Here’s an unfortunate (but legal!) grammar rule: ::= “+” Here’s some code for it: public boolean expression() { if (!expression()) return false; if (!addOperator()) return true; if (!term()) error("Error in expression after '+' or '-'"); return true; } Do you see the problem? We aren’t recurring with a simpler case, therefore, we have an infinite recursion Our grammar rule is left recursive (the recursive part is the leftmost thing in the definition)

23 Recursion, II Here’s our unfortunate grammar rule again: ::= “+” Here’s an equivalent, right recursive rule: ::= “+” Here’s some (much happier!) code for it: public boolean expression() { if (!term()) return false; if (!addOperator()) return true; if (!expression()) error("Error in expression after '+' or '-'"); return true; }

24 Extended BNF—optional parts Extended BNF uses brackets to indicate optional parts of rules Example: ::= “if” [ “else” ] Pseudocode for this example: public boolean ifStatement() { if you don’t see “if”, return false if you don’t see a condition, return an error if you don’t see a statement, return an error if you see an “else” { if you see a “statement”, return true else return an error } else return true; }

25 Extended BNF—zero or more Extended BNF uses braces to indicate parts of a rule that can be repeated Example: ::= { “+” } Note that this is not a good definition for an expression Pseudocode for example: public boolean expression() { if you don’t see a term, return false while you see a “+” { if you don’t see a term, return an error } return true }

26 Back to parsers A parser is like a recognizer The difference is that, when a parser recognizes something, it does something about it Usually, what a parser does is build a tree If the thing that is being parsed is a program, then You can write another program that “walks” the tree and executes the statements and expressions as it finds them Such a program is called an interpreter You can write a similar program that “walks” the tree and produces code in some other language (usually assembly language) that does the same thing Such a program is called a compiler

27 Conclusions If you start with a BNF definition of a language, You can write a recursive descent recognizer to tell you whether an input string “belongs to” that language (is a valid program in that language) Writing such a recognizer is a “cookbook” exercise—you just follow the recipe and it works (hopefully) You can write a recursive descent parser to create a parse tree representing the program The parse tree can later be used to execute the program BNF is purely about syntax BNF tells you what is legal, and how things are put together BNF has nothing to say about what things actually mean

28 The End