Download presentation
Presentation is loading. Please wait.
1
Design Patterns for Recursive Descent Parsing
Dung Nguyen, Mathias Ricken & Stephen Wong Rice University
2
RDP in CS2? Context: objects-first intro curriculum which already covers Polymorphism Recursion Design patterns (visitors, factories, etc) OOD principles Want good OOP/D example Want a relevant CS topic Recursive Descent Parsing: Smooth transitions from simple to complex examples, developing abstract model ∆ change in grammar ∆ change in code
3
The Problem of Teaching RDP
Mutual Recursion! Parser generator ? ? “A complex, isolated, advanced topic for upper division only” Global Analysis ? ? New Grammar High level topic only Complex Non-modular Difficult to extract an overall abstraction Scaling to generators problematic Less useful from a pedagogical standpoint. Difficult example to learn recursion with Path from grammar to parser easy for computer but hard for humans New Code
4
Object-Oriented Approach
Grammar must drive any processing related to it, e.g. parsing. Model the grammar first: Terminal symbols (tokens) Non-Terminal symbols (incl. start symbol) Rules Driving forces Decouple intelligent tokens from rules visitors to tokens Extensible system: open ended number of tokens extended visitors Then Parsing will come! Intelligent tokens vs. switching on dumb tokens Rules as visitors vs. interpreter pattern on tokens Localized decisions Express the overall abstraction Pedagogical aspects Tangibility of objects makes understanding the recursion easier Easier to see how the grammar creates the parser Fits in with OO curriculum—no new concepts to master
5
Representing Tokens Intelligent Tokens No type checking!
Decoupled from processing Visitor pattern For LL(1) grammars, in any given situation, the token determines the parsing action taken Parsing is done by visitors to tokens
6
Processing Tokens with Visitors
Standard Visitor Pattern: Visitor caseA caseB visits Token A calls visits Token B calls But we want to be able to add an unbounded number of tokens!
7
Processing Tokens with Visitors
Visitor Pattern modified with Chain-of-Responsibility: VisitorA defaultCase Visitor caseA visits Token A caseB VisitorB caseA calls delegates to visits chain Token B calls visits VisitorB defaultCase calls caseB caseB Handles Any Types of Tokens!
8
Modeling an LL(1) Grammar
F | F + E E1 F E1 num | id empty | Preparing the LL1 grammar Left-factorization Left-Factoring Make grammar predictively parsable
9
Modeling an LL(1) Grammar
F E1 E1 empty | + E F E1a E1a num | id F num | id F1 F1 F2 F2 Preparing the LL1 grammar Associating with a unique symbol Sequences – separates from branches Wrappers of terminals In multiple rules (branches), replace sequences and tokens with unique non-terminal symbols Branches only contain non-terminals
10
Modeling an LL(1) Grammar
Branches modeled by inheritance (“is-a”) A B | C Sequences modeled by composition (“has-a”) Local view only. For non-terminals Representing multiple rules with inheritance (union) F1 is a F and F2 is a F union Representing a sequence with composition E1a has a +, E1a has an E. composite with sequential processing S X Y
11
Object Model of Grammar
E F E1 E1 empty | E1a E1a + E F F1 | F2 F1 num F2 id Move this before “Representing the Tokens”? Grammar Structure = Class Structure
12
Modeling an LL(1) Grammar
No Predictive Parsing Table! Declarative, not procedural Model the grammar, not the parsing!
13
Detailed and Global Analysis
Abstract and Local Analysis! Detailed and Global Analysis E F E1 To process E, we must have the ability to process F and E1, independent of how either F or E1 are processed! To process E, we must first know about F and E1… E1 empty | E1a E1a E1a + E But to process F, we must first know about F1 and F2… F F1 | F2 Since parsing is done with visitors to tokens, all we need to parse E are the visitors to parse F and E1. F1 F1 num but to process F1, we must first know about num! F2 id Interdependence between rules One rule needs functionality of another rule Circular relationship problem Delegation model Visitors to tokens determine the parsing that occurs due to the grammar rules. Replaces switch statements With visitors, don’t need to know either which token or what rules to follow can think in terms of abstract behaviors Look at abstract behavior to decouple Abstract behaviors abstract construction Abstract Factories create concrete instances of the abstract behaviors Solution using factories Branching factory Sequence factory But E doesn’t know what it takes to make the F and E1 parsing visitors… The processing of one rule requires deep knowledge of the whole grammar! We need abstract construction of the visitors… Or does it??... Abstract Factories Decouple Rules
14
Factory Model of Parser
E F E1 E1 empty | E1a E1a + E F F1 | F2 F1 num F2 id Parser Structure = Factory Structure Grammar represented purely with composition
15
Extending the Grammar Adding new tokens and rules
Highly localized impact on code No re-computing of prediction tables
16
E F E1 E1 empty | E1a E1a + E F F1 | F2 F1 num F2 id
E S E1 E1 empty | E1a E1a + E S P | T P (E) T F T1 T1 empty | T1a T1a * S F F1 | F2 F1 num F2 id
17
We change your grammar in two minutes
Parser Demo (If time permits) We change your grammar in two minutes while you wait! gram
18
Automatic Parser Generator
No additional theory needed for generalization No fixed-points, FIRST and FOLLOWS sets Kooprey Parser generator: BNF Java kou·prey (noun): “a rare short-haired ox (Bos sauveli) of forests of Indochina […]” (Merriam-Webster Online) Extensions Skip generation of source, create parser at runtime
19
Conclusion Simple enough to introduce in CS2 course – near end of CS2) Teaches an abstraction of grammars and parsing Reinforces foundational OO principles Abstract representations Abstract construction Decoupled systems Recursion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.