Recursive Descent Parsing for XML Developers Roger L. Costello 15 October 2014 1.

Slides:



Advertisements
Similar presentations
Parsing for XML Developers Roger L. Costello 28 September 2014.
Advertisements

How to find and remove unproductive rules in a grammar Roger L. Costello May 1, 2014 New! How to find and remove unreachable rules in a grammar.
Functional Design and Programming Lecture 9: Lexical analysis and parsing.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Top-Down Parsing.
Context-Free Grammars Lecture 7
Fall 2007CS 2251 Miscellaneous Topics Deque Recursion and Grammars.
Parsing III (Eliminating left recursion, recursive descent parsing)
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
CS 330 Programming Languages 09 / 23 / 2008 Instructor: Michael Eckmann.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Professor Yihjia Tsai Tamkang University
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands.
Bottom-up parsing Goal of parser : build a derivation
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
LANGUAGE TRANSLATORS: WEEK 3 LECTURE: Grammar Theory Introduction to Parsing Parser - Generators TUTORIAL: Questions on grammar theory WEEKLY WORK: Read.
AN IMPLEMENTATION OF A REGULAR EXPRESSION PARSER
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Syntax and Semantics Structure of programming languages.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
Introduction to Parsing
Comp 311 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright August 28, 2009.
Recursive Descent Parsers Lecture 6 Mon, Feb 2, 2004.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
The Interpreter Pattern (Behavioral) ©SoftMoore ConsultingSlide 1.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Top-Down Parsing.
PZ03BX Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03BX –Recursive descent parsing Programming Language.
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Lesson 4 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
CS 2130 Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing Warning: The precedence table given for the Wff grammar is in error.
Programming Languages Translator
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

Recursive Descent Parsing for XML Developers Roger L. Costello 15 October

Table of Contents Introduction to parsing in general, recursive descent parsing in particular Example #1: How to do recursive descent parsing on Book data Example #2: How to do recursive descent parsing for a grammar that contains alternatives Limitations of recursive descent parsing 2

Flat XML Document You might receive an XML document that has no structure. For example, this XML document contains a flat (linear) list of Book data: Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications 2

Give it structure to facilitate processing Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications 3

That’s parsing! Parsing is taking a flat (linear) sequence of items and adding structure so that the result conforms to a grammar. 4

Parsing Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications parse 5

From the book: “ Parsing Techniques ” Parsing is the process of structuring a linear representation in accordance with a given grammar. The “ linear representation ” may be: a flat sequence of XML elements a sentence a computer program a knitting pattern a sequence of geological strata a piece of music actions of ritual behavior 7

Grammar A grammar is a succinct description of the structure. Here is a grammar for Books: Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text 7

Parsing parser Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Grammar Linear representation Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Structured representation 8

Alternate view of the parser output parser Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Grammar Linear representation Parse tree 8 Books Book Title Authors Date ISBN Publisher Book Title Authors Author Date ISBN Publisher Book Title Authors Author Date ISBN Publisher Author Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications

Parsing Techniques Over the last 50 years many parsing techniques have been created. Some parsing techniques work from the starting grammar rule to the bottom. Those are called top-down parsing techniques. Other parsing techniques work from the bottom grammar rules to the starting grammar rule. Those are called bottom-up parsing techniques. The following slides explain the “ recursive descent parsing technique. ” It is a top-down parsing technique. 9

Terminology: Token A token is an atomic (indivisible) unit. Each item in the input is a token. After parsing the tokens will be leaf nodes. 12

The input consists of a sequence of tokens Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Each of these are tokens. This input consists of 16 tokens. 13

After parsing the tokens will be leaf nodes Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications tokens (terminal symbols) 14

Another view of the tokens, after parsing Books Book Title Authors Date ISBN Publisher Book Title Authors Author Date ISBN Publisher Book Title Authors Author Date ISBN Publisher Author Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications 15

Parsing structures the input by wrapping the tokens in non-terminal symbols Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications non-terminal symbols 16

Recursive descent parsing Recursive descent parsing works like this: Start at the grammar’s start symbol and output it. In our grammar, the start symbol is, so output it. Progress through each grammar rule. For a non-terminal symbol, output it. For a terminal symbol (i.e., token), check the token in the input stream for match with the terminal symbol; if it matches, output it. 17

Initial Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text 7 Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Start with the grammar’s start symbol and the first token in the input stream.

Output the start symbol Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Output: 19

Grammar says there must be at least one Book Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text So the input stream must contain all the tokens for at least one Book. Let’s process the grammar rule for Book. 20

Output Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Output: 21

Grammar says the token in the input stream must be Title Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Parsing Techniques Output: Yea, the input token matches the grammar rule 22

Grammar: after Title must be Authors Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text So the input stream must contain Author tokens. Let’s process the rule for Authors. 23

Output Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Output: 24

Grammar says the next token in the input stream must be an Author token Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Yea, the input token matches the grammar rule Dick Grune Output: 25

Grammar says the next token in the input stream may be an Author token Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Another Author match Dick Grune Ceriel J.H. Jacobs Output: 26

The next token in the input stream is not an Author token Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications So, return to the caller (i.e., return to the Book rule). 27

Grammar says the input stream token must be a Date token Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Yea, the input token matches the grammar rule Dick Grune Ceriel J.H. Jacobs 2007 Output: 28

Grammar says the input stream token must be an ISBN token Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Yea, the input token matches the grammar rule Dick Grune Ceriel J.H. Jacobs Output: 29

Grammar says the input stream token must be a Publisher token Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Yea, the input token matches the grammar rule Dick Grune Ceriel J.H. Jacobs Springer Output: 30

We’ve completed structuring the first 6 input tokens Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications Dick Grune Ceriel J.H. Jacobs Springer Output: 31

Completed the Book rule Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications We’ve finished processing the Book rule, so return to the caller (i.e., the Books rule). 32

Begin work on structuring the next Book Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Title → text Author → text Date → text ISBN → text Publisher → text Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications 33

Implementation The following slides show, in a step-by-step manner, how to implement a recursive descent parser 34

Step 1 Create a function for each non-terminal symbol in the grammar: Books() { … } Book() { … } Authors() { … } Books → Book+ Book → Title Authors Date ISBN Publisher Authors → Author+ Functions 35

Step 2 Create a global element, Token, that is used to identify the current position in the input stream. Initialize Token to 0: Token = 0 36

Step 3 Create a function, get_next_token(). When it is called, it increments the current position in the input stream: get_next_token() { Token = Token + 1 } 37

Step 4 Create a function, token(), and pass it a name, tk. The purpose of this function is to answer the question: “ Does the token at the current position in the input stream match tk? ” 38

Example of using the token() function Suppose that during recursive descent parsing the grammar indicates that the next token in the input stream must be “ Title. ” Suppose the global variable, Token, indicates that we are here in the input stream: Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications 39

Example (cont.) The token() function determines that there is a match, so it calls get_next_token() to increment the position in the input stream and returns the token: Parsing Techniques Dick Grune Ceriel J.H. Jacobs Springer Introduction to Graph Theory Richard J. Trudeau Dover Publications Introduction to Formal Languages Gyorgy E. Revesz Dover Publications return 40

The token() function token(string tk) { if (tk != input[position() = Token]) then return () else { get_next_token() return input[position() = Token]) } Notice that token() returns empty if there is not a match. 41

Motivation for Step 5 Suppose that during recursive descent parsing we are in the Book() function. The Book() function first checks—by calling the token() function—to see if the current position of the input stream contains “ Title. ” Suppose it does. Then, according to the grammar, there must be Authors, Date, ISBN, and then Publisher: Book → Title Authors Date ISBN Publisher 42

Step 5 Create a function, require(), and pass it a token, found. If the token is empty (i.e., the token() function returned empty because there was not a match) then call the error() function. Otherwise, return the token. require(element found) { if empty(found) then error(‘Invalid input’) else return found } 43

Step 6 Create an error function, error(). Pass it a string. It outputs the string and then halts the parser. error(string s) { output s stop } 44

The complete implementation Recursive descent has been around a long time and people have developed beautiful code for it. The following two slides collects all the code from the previous slides. I recommend spending some time studying it to appreciate its beauty. 45

Token = 0 main() { get_next_token() require(input()) } input() { return require(Books()) } Books() { return (require(Book()), optional_additional_Books()) } optional_additional_Books() { book = Book() if exists(book) then return (book, optional_additional_Books()) } Book() { title = token('Title') if exists(title) then return (title, require(Authors(), require(token('Date')), require(token(‘ISBN')), require(token(‘Publisher')) } Authors() { return (require(Author()), optional_additional_Authors()) } Code for a Recursive Descent Parser 46

optional_additional_Authors() { author = token(‘Author') if exists(author) then return (author, optional_additional_Authors()) } token(string tk) { if (tk != input[position() = Token]) then return () else { get_next_token() return input[position() = Token]) } require(element found) { if empty(found) then error(‘Invalid input’) else return found } get_next_token() { Token = Token + 1 } 47

XSLT Implementation I created an XSLT implementation. I tried to mirror the beautiful code shown on the previous slides. If you would like to give my implementation a go, here is the XSLT program and a sample flat (linear) input XML document: parser.xsl parser.xsl test.xml test.xml 48

Richer example The Books example shown on the previous slides was fine for introducing recursive descent parsing. But it glossed over an important problem: grammar rules with alternatives. The following example shows how to do recursive descent parsing with a grammar that has alternatives. 49

Expressions Let’s parse a simple expression language that has these tokens: IDENTIFIER, addition, parentheses, and EoF. Here are a few examples of expressions: IDENTIFIER EoF (IDENTIFIER) EoF IDENTIFIER + IDENTIFIER EoF (IDENTIFIER + IDENTIFIER) EoF IDENTIFIER + (IDENTIFIER + IDENTIFIER) EoF (IDENTIFIER + IDENTIFIER) + IDENTIFIER EoF IDENTIFIER + (IDENTIFIER + (IDENTIFIER + IDENTIFIER)) EoF Each expression ends with an end-of-file (EoF) token. 50

Expression grammar input → expression EoF expression → term rest_expression term → IDENTIFIER | parenthesized_expression parenthesized_expression → '(' expression ')' rest_expression → '+' expression | ε 51

Parse tree for: IDENTIFIER EoF input → expression EoF expression → term rest_expression term → IDENTIFIER | parenthesized_expression parenthesized_expression → '(' expression ')' rest_expression → '+' expression | ε input expressionEoF termrest_expression IDENTIFIER ε 52

Parser selects the first alternative input → expression EoF expression → term rest_expression term → IDENTIFIER | parenthesized_expression parenthesized_expression → '(' expression ')' rest_expression → '+' expression | ε input expressionEoF termrest_expression IDENTIFIER ε term has two alternatives. The parser selected the first alternative. 53

Parse tree for: (IDENTIFIER) EoF input expressionEoF termrest_expression parenthesized_expression ε (expression) termrest_expression IDENTIFIER ε input → expression EoF expression → term rest_expression term → IDENTIFIER | parenthesized_expression parenthesized_expression → '(' expression ')' rest_expression → '+' expression | ε 54

Parser selects the second alternative input → expression EoF expression → term rest_expression term → IDENTIFIER | parenthesized_expression parenthesized_expression → '(' expression ')' rest_expression → '+' expression | ε input expressionEoF termrest_expression parenthesized_expression ε (expression) termrest_expression IDENTIFIER ε term ’s second alternative is selected 55

Question How does a recursive descent parser know that it should select the first or second alternative? term → IDENTIFIER | parenthesized_expression How does the parser know which alternative to select? 56

Answer The parser doesn’t know. It tries the first alternative. If that fails it tries the second alternative (i.e., the parser backtracks and tries the next alternative). It repeats until it finds an alternative that succeeds. 57

Processing the first token in the input stream input → expression EoF expression → term rest_expression term → IDENTIFIER | parenthesized_expression parenthesized_expression → '(' expression ')' rest_expression → '+' expression | ε input expression term IDENTIFIER Try the first alternative, which says the input token must be IDENTIFIER. However, the input token is ( so we must back up and try the next alternative Input tokens: ( IDENTIFIER ) EoF 58

Implementation of the term() function term() { identifier = token('IDENTIFIER') if exists(identifier) then return (identifier) else return (require(parenthesized_expression())) } Check the current token in the input stream to see if it is IDENTIFIER. 59

term() function (cont.) term() { identifier = token('IDENTIFIER') if exists(identifier) then return (identifier) else return (require(parenthesized_expression())) } If there is a match, return the token. 60

term() function (cont.) term() { identifier = token('IDENTIFIER') if exists(identifier) then return (identifier) else return (require(parenthesized_expression())) } Otherwise try the other alternative, it must succeed. 61

Let’s represent each expression as XML Instead of this input: IDENTIFIER EoF our input will be this: 62

XML representation (cont.) Instead of this input: (IDENTIFIER) EoF our input will be this: 63

XML representation (cont.) Instead of this input: IDENTIFIER + IDENTIFIER EoF our input will be this: 64

XML representation (cont.) Instead of this input: IDENTIFIER + (IDENTIFIER + IDENTIFIER) EoF our input will be this XML input: 65

Parsing Parser input → expression EoF expression → term rest_expression term → IDENTIFIER | parenthesized_expression parenthesized_expression → '(' expression ')' rest_expression → '+' expression | ε 66

Parsing (cont.) Parser input → expression EoF expression → term rest_expression term → IDENTIFIER | parenthesized_expression parenthesized_expression → '(' expression ')' rest_expression → '+' expression | ε 67

Parsing (cont.) Parser input → expression EoF expression → term rest_expression term → IDENTIFIER | parenthesized_expression parenthesized_expression → '(' expression ')' rest_expression → '+' expression | ε 68

Parsing (cont.) Parser input → expression EoF expression → term rest_expression term → IDENTIFIER | parenthesized_expression parenthesized_expression → '(' expression ')' rest_expression → '+' expression | ε 69

XSLT Implementation I created an XSLT implementation of a recursive descent parser for the expression language. If you would like to give my implementation a go, here is the XSLT program and a sample flat (linear) input XML document: parser/expression-parser.xsl parser/expression-parser.xsl parser/expression-test.xml parser/expression-test.xml 70

Limitations of Recursive Descent Parsers Recall that in a rule containing alternatives we tried the first alternative, if it failed we backtracked and tried the second alternative. Searching the alternatives is time-consuming. 71

Limitations (cont.) Recursive descent parsers can’t handle left-recursive grammar rules. The parser goes into an infinite loop. Example: suppose the grammar has this rule: expression → expression '-' term That is a “ left-recursive ” rule: on the rule’s right-hand side it starts with the same symbol as on the left-hand side (i.e., expression ). The recursive descent routine for this rule is: expression() { return expression() and require(token(‘-’)) and require(term) } (infinite) recursion! 72

Limitations (cont.) Suppose we add an array element as a term: term → IDENTIFIER | indexed_element | parenthesized_expression indexed_element → IDENTIFIER '[' expression ']' and create a recursive descent parser for the new grammar. The routine for indexed_element will never be tried: when the sequence IDENTIFIER '[' occurs in the input, the first alternative of term will succeed, consume the identifier, and leave the indigestible part '[' expression ']' in the input. 73

References – Great Books Modern Compiler Design ( Compiler-Design-Dick- Grune/dp/ /ref=sr_1_1?s=books&ie=UTF8&qid= &sr=1-1&keywords=modern+compiler+design) Compiler-Design-Dick- Grune/dp/ /ref=sr_1_1?s=books&ie=UTF8&qid= &sr=1-1&keywords=modern+compiler+design Parsing Techniques ( Practical-Monographs- Computer/dp/ X/ref=sr_1_1?s=books&ie=UTF8&qid= &sr=1-1&keywords=parsing+techniques) Practical-Monographs- Computer/dp/ X/ref=sr_1_1?s=books&ie=UTF8&qid= &sr=1-1&keywords=parsing+techniques 74