The Recursive Descent Algorithm

Slides:



Advertisements
Similar presentations
Semantics Static semantics Dynamic semantics attribute grammars
Advertisements

ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
CS 280 Data Structures Professor John Peterson. Lexer Project Questions? Must be in by Friday – solutions will be posted after class The next project.
28-Jun-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating.
1 Week 4 Questions / Concerns Comments about Lab1 What’s due: Lab1 check off this week (see schedule) Homework #3 due Wednesday (Define grammar for your.
CS 280 Data Structures Professor John Peterson. How Does Parsing Work? You need to know where to start (“statement”) This grammar is constructed so that.
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Lexical and Syntax Analysis
COMP Parsing 3 of 4 Lectures 23. Using the Scanner Break input into tokens Use Scanner with delimiter: public void parse(String input ) { Scanner.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
22-Nov-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
Comp 411 Principles of Programming Languages Lecture 3 Parsing
COMPILER CONSTRUCTION
User-Written Functions
COMP261 Lecture 18 Parsing 3 of 4.
Programming Languages 2nd edition Tucker and Noonan
Parsing III (Top-down parsing: recursive descent & LL(1) )
A Simple Syntax-Directed Translator
Constructing Precedence Table
CS 326 Programming Languages, Concepts and Implementation
CS510 Compiler Lecture 4.
Lexical and Syntax Analysis
Introduction to Parsing (adapted from CS 164 at Berkeley)
Objectives You should be able to describe: Interactive Keyboard Input
Textbook:Modern Compiler Design
Parsing IV Bottom-up Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Loop Structures.
PROGRAMMING LANGUAGES
Context-free Languages
Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.
CS 153: Concepts of Compiler Design December 5 Class Meeting
Trees.
4 (c) parsing.
CMPE 152: Compiler Design February 6 Class Meeting
Parsing Techniques.
Recursive descent parsing
Programming Language Syntax 7
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Mini Language Interpreter Programming Languages (CS 550)
CSE 3302 Programming Languages
ADTs, Grammars, Parsing, Tree traversals
R.Rajkumar Asst.Professor CSE
Programming Language Syntax 5
Syntax-Directed Translation
Recognizers 1-Jan-19.
Parsing IV Bottom-up Parsing
Recognizers 16-Jan-19.
Computing Follow(A) : All Non-Terminals
Recognizers 22-Feb-19.
Trees.
Chapter 4: Lexical and Syntax Analysis Sangho Ha
Syntax Analysis - Parsing
6.001 SICP Interpretation Parts of an interpreter
Compiler Construction
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Design Yacc Example "Yet Another Compiler Compiler"
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Programming Languages and Compilers (CS 421)
CMPE 152: Compiler Design December 4 Class Meeting
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

The Recursive Descent Algorithm A useful predictive parser for many applications. Under Construction (Nov 16)

The Recursive Descent Algorithm The recursive descent algorithm directly implements a grammar written as EBNF rules. The rules should not contain left recursion There is one function (method) for each EBNF rule. Each method parses the input corresponding to its EBNF rule, and returns a value. The value may be: a node on the abstract syntax tree of the input value computed by evaluating the input (e.g. a calculator) Recursive descent is a predictive parser. Limited look-ahead ("peek" at the next token) can be incorporated.

Recursive-descent intro (0) Grammar: expr => expr + term | expr - term | term term => term  factor | factor factor => '(' expr ')' | number

Recursive-descent intro (0.5) Grammar in EBNF (no "self-recursion"): expr => term { ( + | -) term } term => factor {  factor } factor => '(' expr ')' | number

Recursive-descent intro (1) Grammar: expr => term { + term } term => factor {  factor } factor => '(' expr ')' | number Generic C code for concept only (don't use this): expr() { term(); while(token=='+') { match('+'); } term() { factor(); while(token=='*') { match('*'); }

Recursive-descent intro (2) Grammar: expr => term { + term } term => factor {  factor } factor => '(' expr ')' | number Factor and number: factor() { if (token == ‘(‘) { match('('); expr( ); match(‘)’); } else number( ); number() { if ( isNumber(token) ) { add_to_parse_tree(); nextToken( ); } else error("invalid number");

Recursive-descent intro (3) match(value) is a utility that requires a match: if current token matches the argument, consume the token and get next token. Otherwise print an error. ... and then what? void match(char what) { if ( *token == what ) { nextToken( ); } else { /* 'printf' style error function */ error("expected %c got %s", what, token);

Where's the token? In this algorithm, token is a global variable that always contains the next unread token. nextToken() returns true if there are more tokens, and also sets the token variable. boolean nextToken( ) { token = scanner( ); return ( token != EOF ); } Another utility function is match(value): 1) if value matches token, get a new token 2) if value doesn't match, raise an error condition.

Where's the output? In the generic algorithm, the result is a global variable. The methods must either return a value or accumulate value as a side effect. Rules which have terminal values should return the terminal value. factor => ( expr ) | number number() { if ( isNumber(token) ) { // add token to the parse tree // or return a value } else error("invalid number");

Recursive Descent Example (1) Let's look at a recursive descent code for a calculator. We will modify the generic algorithm so that each function returns a double value. input: expr '\n' expr: term { (+|-) term } term: factor { (*|/) factor} factor: '(' expr ')' | number

Recursive Descent Example (1) Let's look at a recursive descent code for a calculator. We will modify the generic algorithm so that each function returns a double value. Example: here is a modified expr( ) function double expr() { double expr = term(); while( token =='+' || token =='-' ) ) { if (token == '+') { match('+'); expr = expr + term(); } else { match('-'); expr = expr - term(); } return expr ; Grammar Rule: expr: term { (+|-) term }

Recursive Descent Example (2) The rule for factor is more interesting: we must check the first token to decide which alternative to use, then double factor() { double fact; if ( token == '(' ) { nextToken( ); fact = expr( ); match( ')' ); return fact; } else { fact = number( ); Grammar Rule: factor: '(' expr ')' | number

Recursive Descent Example (3) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "2" input line token = nextToken(); ans = expr( );

Recursive Descent Example (4) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "2" input line ans = expr( ); expr expr = term( ); expr( ) { expr = term( ); while ( token=='+'|| token='-') { if ( token=='+' ) { match('+'); expr = expr + term( ); } else { match('-'); expr = expr - term( ); } return expr;

Recursive Descent Example (5) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "2" input line ans = expr( ); expr expr = term( ); term term = factor( ); term( ) { term = factor( ); while ( token=='*' || token=='/' ) { if ( token=='*' ) { match('*'); term = term * factor( ); } else { match('/'); term = term / factor( ); }

Recursive Descent Example (6) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "*" input line ans = expr( ); expr expr = term( ); term term = factor( ); factor( ) { if ( token=='(' ) { match('('); fact = expr( ); match(')'); } else { fact = number( ); } factor fact = number( ); /* token = '*' */ return fact

Recursive Descent Example (7) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "3" input line ans = expr( ); expr expr = term( ); term term = 2; term = term * factor( ); term( ) { term = factor( ); while ( token=='*' || token='/' ) { if ( token=='*' ) { match('*'); term = term * factor( ); } else { match('/'); term = term / factor( ); }

Recursive Descent Example (8) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "+" input line ans = expr( ); expr expr = term( ); term term = term * factor( ); factor( ) { if ( token=='(' ) { match('('); fact = expr( ); match(')'); } else { fact = number( ); } factor fact = number( ); /* token = '*' */ return fact

Recursive Descent Example (9) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "+" input line ans = expr( ); expr expr = term( ); term term = term * 3; return term term( ) { term = factor( ); while ( token=='*' || token=='/' ) { if ( token=='*' ) { match('*'); term = term * factor( ); } else { match('/'); term = term / factor( ); }

Recursive Descent Example (10) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "(" input line ans = expr( ); expr expr = 6; token = '+' match('+') expr = expr + term( ) expr( ) { expr = term( ); while( token=='+'|| token=='-') { if ( token=='+' ) { match('+'); expr = expr + term( ); } else { match('-'); expr = expr - term( ); } return expr;

Recursive Descent Example (11) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "(" input line ans = expr( ); expr expr = term( ); term term = factor( ); term( ) { term = factor( ); while ( token=='*' || token=='/' ) { if ( token=='*' ) { match('*'); term = term * factor( ); } else { match('/'); term = term / factor( ); }

Recursive Descent Example (12) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "4" input line ans = expr( ); expr expr = term( ); term term = term * factor( ); factor( ) { if ( token=='(' ) { match('('); fact = expr( ); match(')'); } else { fact = number( ); } return fact; factor match('(') fact = expr( );

Recursive Descent Example (13) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "4" input line ans = expr( ); expr expr = term( ); expr( ) { expr = term( ); while (token=='+'|| token=='-') { if ( token=='+' ) { match('+'); expr = expr + term( ); } else { match('-'); expr = expr - term( ); } return expr; term term = term * factor( ); factor fact = expr( ); expr expr = term( );

Recursive Descent Example (14) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "-" input line ans = expr( ); expr expr = term( ); term term = term * factor( ); factor fact = expr( ); expr expr = term( ); term term = factor( ); factor fact = number( ); /* = 4, token = "-" */

Recursive Descent Example (15) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "-", then token = "5" input line ans = expr( ); expr expr = term( ); expr( ) { expr = term( ); while (token=='+'|| token=='-') { if ( token=='+' ) { match('+'); expr = expr + term( ); } else { match('-'); expr = expr - term( ); } return expr; term term = term * factor( ); factor fact = expr( ); expr match('-') expr = expr - term( );

Recursive Descent Example (16) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "5" input line ans = expr( ); expr expr = term( ); term term = term * factor( ); factor fact = expr( ); expr expr = expr - term( ); term term = factor( ); factor fact = number( ); /* = 5 . token = ")" */

Recursive Descent Example (16) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "/" input line ans = expr( ); expr expr = term( ); term( ) { term = factor( ); while ( token=='*' || token=='/' ) { if ( token = '*' ) { ... } term term = factor( ); factor fact = expr( ); match(')'); return fact; expr expr = 4 - 5; return expr term term = 5; return term; factor return 5

Recursive Descent Example (17) Input: 2 * 3 + ( 4 - 5 ) / 6 Progress: token = "6" input line ans = expr( ); expr expr = term( ); term term = -1; match('/') term = term / factor( ); term( ) { term = factor( ); while ( token=='*' || token=='/' ) { if ( token=='*' ) { match('*'); term = term * factor( ); } else { match('/'); term = term / factor( ); }

Imperative Approach to Parsing In the generic algorithm, the token is a global variable, and the results of the parse are a side effect (a change to global variables or structures) bison and flex operate this way, too. Programs difficult to understand and maintain. No error recovery in generic algorithm. /* yylex uses global variables / constants. */ int yylex( ) { ... if ( isdigit(c) ) { ungetc(c, stdin); scanf("%lf", &yylval); return INT; }

O-O Approach to Parsing In O-O approach, we can return an object to allow a scanner and parser without global variables. First, let's look at the overall design. <<interface>> Iterator <<enum>> TokenType refex : Patterm IDENTIFIER OPERATOR NUMBER hasNext() next() Parser Scanner parseTree: TreeSet token: Token scanner: Iterator instream: InputStream token: Token hasNext( ) : boolean expression( ) : Node Token next( ) : Token term( ) : Node type value factor( ) : Node match( String ) : boolean

O-O Scanner The Scanner should provide two services: test for more tokens and return the next token. In this view, a Scanner looks like an Iterator<Token>. A "token" has both a type and a value. /** Token class */ public class Token { Type type; /* consider an enumeration */ public Object value; /* can be anything */ public Token(Type type, Object value) {...} public Object getValue( ) { ... } }

O-O Parser The Parser implements the parsing algorithm. Result is either a parse tree or a value (calculator application). Use an attribute to represent next token. /** Parser class */ public class Parser { Iterator<Token> scanner; private Token token; private TreeNode result; /* parse tree */ TreeNode expression( ) { ... }; TreeNode term( ) { ... }; TreeNode factor( ) { ... }; boolean match( String what ) { ... }; boolean match( Type what ) { ... }; }

O-O Parser for Calculator For a calculator, the parser can compute result. Can use a primitive data type for expression, etc. /** Parser class */ public class Parser { Iterator<Token> scanner; private Token token; private double result; double expression( ) { ... }; double term( ) { ... }; double factor( ) { ... }; boolean match( String what ) {...}; }

Observation: match If the generic algorithm, the token is almost always tested before calling match. Eliminate redundancy by redefining match(value) to return a boolean value if token matches. if match, then consume the token. private boolean match( String what ) { if ( ! (token.value instanceof String) ) return false; if ( what.equals( (String)(token.value) ) ) { token = scanner.next( ); return true; }

O-O Parser for Calculator (2) Example method: expression EBNF: expr ::= term { (+ | -) term } private double expression( ) { double result = term( ); while( true ) { if ( match("+") ) result += term( ); else if ( match("-") ) result -= term( ); else break; /* why not error( )? */ } return result;

O-O Parser : Top-Level What is the top-level routine of the parser? Look at standard bison code for inspiration: %% /* Bison grammar rules */ input : /* empty input */ | input line ; line : expr '\n' { output( $1 ); }

Parsing Errors How are you going to handle parsing errors? You might have many levels of function calls... input line result = expr( ); expr expr = term( ) { +|- term( ) }; term term = factor( ) { *|/ factor( ) }; factor factor = '(' expr() ')' | number() ...; Using recursive-decent, parse errors are usually detected at the bottom of the tree: in factor, number, etc. expr term factor Parse error found here

Parsing Errors If you set an error flag or return an error result, then all the methods must check for this condition... input line if ( error ) print "parse error"; expr if ( error ) return /* what value? */; This error checking will make your methods longer and harder to understand. term if ( error ) return /* what value? */; factor if ( error ) return /* what value? */; expr if ( error ) return /* what value? */; term if ( error ) return /* what value? */; factor Parse error found here

Throwing an Exception Your code will be simpler if the methods simply throw an exception, and let the top-most method catch it. input line try { result = expr( ); } catch (ParseException e) {/*error*/} expr expr( ) throws ParseException { ... } term term( ) throws ParseException { ... } Let someone else handle it! factor factor( ) throws ParseException { ... } expr expr( ) throws ParseException { ... } term term( ) throws ParseException { ... } factor throw new ParseException( )

Using Java's ParseException Java has a ParseException class you can use: java.text.ParseException the constructor requires two parameters: new ParseException("error message", offset); Example: number( ) { /* parse a number */ whitespace(); token = tokenizer.next(); if ( token.type != TokenType.NUMBER ) throw new ParseException( "invalid number", cptr);

Defining your own ParseException You can define a new Exception type for your own use import java.io.IOException; class ParseException extends IOException { /* constructors */ ParseException() { super("Parse Error"); } ParseException(String msg) { super(msg); } ParseException(String msg, int column) { super(msg + " in column " + column); }

Using ParseException factor( ) { You should try to return useful error messages, such as... factor( ) { if ( match('(') ) { result = expr( ); if ( ! match(')') ) throw new ParseException("missing right parenthesis"); } The getMessage( ) method returns the error message... try { } catch(ParseException e) { println( e.getMessage() ); Including the column number in error messages can be helpful.

Parsing Unary Minus Sign Parsing negative numbers and unary minus can also be tricky. The following are valid expressions in most languages: sum = sum + -1; sum = sum - -2; sum = sum * -x; The GNU C compiler (gcc) allows a space after the unary "-" : sum = sum - - 2; Exponentiation has higher precedence than unary minus, so it should be incorporated in a rule at the bottom of your grammar rules: -2 ^ 3 means - (2^3)

What's Next? Later we will add to the implementation... symbol table and assignments x = 3.5E7 a = 5 b = 0.1 y = ( a*x + b ) / ( a*x - b ) built-in functions y = sqrt( x ) user defined functions function f(x) = a*x + b f(0.5)