A Monadic-Memoized Solution for Left-Recursion Problem of Combinatory Parser Rahmatullah Hafiz 60-520 Fall, 2005.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
CS5371 Theory of Computation
Top-Down Parsing.
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
Chapter 2: Algorithm Discovery and Design
Context-Free Grammars Lecture 7
Slide1 Chapter 4 Lexical and Syntax Analysis. slide2 OutLines: In this chapter a major topics will be discussed : Introduction to lexical analysis, including.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG)
MIT Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Chapter 2: Algorithm Discovery and Design
Chapter 2: Algorithm Discovery and Design
Bottom-up parsing Goal of parser : build a derivation
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Recursive Descent Parsing for XML Developers Roger L. Costello 15 October
Syntax and Semantics Structure of programming languages.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
Top-Down Parsing - recursive descent - predictive parsing
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.
Invitation to Computer Science, Java Version, Second Edition.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
CISC 471 First Exam Review Game Questions. Overview 1 Draw the standard phases of a compiler for compiling a high level language to machine code, showing.
LANGUAGE TRANSLATORS: WEEK 3 LECTURE: Grammar Theory Introduction to Parsing Parser - Generators TUTORIAL: Questions on grammar theory WEEKLY WORK: Read.
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 5 Mälardalen University 2010.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
Syntax and Semantics Structure of programming languages.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
1 Lazy Combinators for Executable Specifications of General Attribute Grammars Rahmatullah Hafiz and Richard A. Frost
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
Compilation With an emphasis on getting the job done quickly Copyright © – Curt Hill.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Interactive Programs: I/O and Monads Dr. Hyunyoung Lee.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Transforming Grammars. CF Grammar Terms Parse trees. – Graphical representations of derivations. – The leaves of a parse tree for a fully filled out tree.
TOP-DOWN PARSING Recursive-Descent, Predictive Parsing.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Costas Busch - LSU1 Parsing. Costas Busch - LSU2 Compiler Program File v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,5.
Chap. 7, Syntax-Directed Compilation J. H. Wang Nov. 24, 2015.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Eliminating Left-Recursion Where some of a nonterminal’s productions are left-recursive, top-down parsing is not possible “Immediate” left-recursion can.
CS 2130 Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing Warning: The precedence table given for the Wff grammar is in error.
Syntax and Semantics Structure of programming languages.
Chapter 4 - Parsing CSCE 343.
Programming Languages Translator
4 (c) parsing.
Top-Down Parsing.
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

A Monadic-Memoized Solution for Left-Recursion Problem of Combinatory Parser Rahmatullah Hafiz Fall, 2005

Outline Part1: Basic Concepts Part2: Related Works Part3: Our Solution

Parsing  Process of determining if input string can be recognized by a set of rules for a specific language  Parser = Program that does parsing Input “2*5+6” Rules exp -> digit op Exp |digit digit -> 0|1|..|9 Op-> *|+ exp digit exp Top-Down Parsing digit 2 op *5+6

Top-Down Parsing  Recognition of Input starts at root and proceeds towards leaves  Left to Right recognition  Recursive-decent (back-tracking) parsing  If one rule fails then try another rule recursively  Comparatively easy to construct  Exponential in worst-case

Combinatory Parser  Parsers written in functional languages  Lazy-Functional Languages (Miranda, Haskell)  Can be used to parse  Natural Language (English)  Formal Language (Haskell)  Need to follow some rules  Context-Free Grammar

Why Lazy-Functional Language  Modular code and easy to implement  Higher-Order functions can represent BNF notations of CFG  Functions are First-Class citizens  Suitable for Top-Down Recursive-Decent fully backtracking parsing Higher-Order functions  Input/Output arguments could be another function(s)

*Frost and Launchbury (JFP, 1989)—NLI in Miranda *Hutton (JFP, 1992 )— Uses of Combinatory Parser Example BNF notation of CFG s ::= a s|empty We can read it like “s is either a then s or empty” => s = a then s or empty **Possible to write parsers exactly like this in LFL using-higher order functions

Example (cont.) --s :: a s|empty empty input = [input] a (x:xs) = if x==‘a’ then xs else [] (p `or` q ) input = p input ++ q input (p `then` q) input = if r == [] then [] else q r where r = p input s input = (a `then` s `or` empty) input --*Main> s "aaa“ --["","a","aa","aaa"]

Problem with Left-Recursive Grammar (“aa”, “a”) Right-recursive grammar s :: a s|a Input “aaa” s as a s a (“a”, “aa”) (“aaa”, “”) Terminates Left-recursive grammar s :: s a|a Input “aaa” a aaa aa a s s a s a Never Terminates s aaa “”

Example --s :: s a|empty empty input = [input] a (x:xs) = if x==‘a’ then xs else [] (p `or` q ) input = p input ++ q input (p `then` q) input = if r == [] then [] else q r where r = p input s input = (s `then` a `or` empty) input --*Main> s "aaa“ --(*** Exception: stack overflow Never Terminates

Why bother for Left-Recursive Parsing?  Easy to implement  Very modular  In case of Natural Language Processing  Expected ambiguity can be achieved easily  Non-left Recursive grammars might not generates all possible parses  As left recursive grammar’s root grows all the way to the bottom level, it ensures all possible parsing

Related Approaches  Approach leads to exponential time complexity Lickman (1995) – Fixed point solution  Most of the approaches deals with Transforming left-recursive grammar to non-left recursive grammar Frost (1992) – Guarding left-production with non- left productions Hutton (1992, 1996) – Grammar transformation

Related Approaches Transforming left-recursive grammar to non-left recursive grammar  violates semantic rules  structure of parse trees are completely different Example s::sa s::bs` [rule 1] [rule ??] |b s`::as`|empty [rule 2] [rule ??] s s a s b a s b S` a empty a

Our Approach: Monadic-Memoization * Frost and Hafiz (2005)  The idea is simple Let not the parse tree grow infinitely s s a s a Parser “fails” when Depth >= length s aaa a Left-recursive grammar s :: s a |a Input “aaa” Depth=1 Length=3 Depth=2 Length=3 Depth=3 Length=3 *Wadler (1985)-- How to replace failure by a list of successes

Monadic-Memoization (cont.) s s a s a  As the recognizer fails, the production rule ‘s::sa’ fails too  Control goes to upper level with ‘failure’  ‘backtracking’, parser tries ‘alternatives’ s aaa a Left-recursive grammar s :: s a |a Input “aaa” Depth=1 Length=3 Depth=2 Length=3 Depth=3 Length=3  If Alternative rule succeeds  Control goes to upper level with left un-recognized inputs  Recursive procedure

Monadic-Memoization (cont.) s Left-recursive grammar s :: s a |a Input “aaa” Depth=1 Length=3 Depth=2 Length=3 Depth=3 Length=3 [fail]  s::sa fails but s::a succeeds  “backtrack ing” s a s a s “” aa [fail] a a

Monadic-Memoization (cont.)  This approach is applicable in Mixed-Environment  Grammar may contain  Left-recursive production  non Left-recursive production s :: s a | a a :: b a | b  During parsing execution of one rule for same input may occur multiple time  Also need to keep track of input length and depth Top-Down Back-trucking is Exponential

Monadic-Memoization (cont.)  Memoization is helpful  The idea is Checking a ‘Memo’ table along with input to each recursive call  ‘Memo’ table contains List of previous parsed outputs for any input, paired with appropriate production Length and depth of current parse  Before parsing any input if “lookup to Memo table fails” then “perform parsing & update the memo table” else “return the result from table”

Monadic-Memoization (cont.) s s a b a “aa” Production ‘a’ “lookups” the memo table =(["","a","aa"],[(“a",[("aa",["","a","aa"]),("a",["","a"]),(10,11)])]) LR Production keeps track of length and depth Production ‘a’ “updates” the memo table

Monadic-Memoization (cont.)  Memoization reduces worst-case time complexity from exponential to O(n 3 )  The problem is Lazy-functional languages don’t let variable updating or keeping a global storage for the whole program  Need to pass around the ‘Memo’ table so that  All recursive parsing calls access ‘Memo’ table  if ‘Memo’ table is used as the function arguments  Code gets messy and error-prone

Monadic-Memoization (cont.) Or we can use ‘Monad’ *Derived from “Category Theory” -- Moggi (1989) *S.E approaches for LFLs -- Wadler (1990)  State, exception, I/O etc of LFL *Monadic Framework for Parsing –-Frost (2003)  Reusable  Complex tasks could be achieved by adding/modifying existing monadic objects  Structured computation

Monadic-Memoization (cont.) Monad is a triple (M, unit, bind)  ‘ M’ type constructor memo = [([Char],[([Char],[[Char]])],[(Int,Int)])] M inp = memo -> (inp, memo)  ‘unit’ :: aM a takes a value and returns the computation of the value Works as a ‘container’  ‘bind’ :: M a  (a  M b)  M b applies the computation ‘a  M b’ to the computation ‘M a’ and returns a computation ‘M b’ Ensures sequential computation

Monadic-Memoization (cont.) The mental picture of ‘how monad works’  Monad is a triple (M, unit, bind) M = Loader unit = tray bind = combiner *Picture source Newbern (2003)

Monadic-Memoization (cont.)  Transform combinatory parsers into Monadic object Example Original “Or” recognizer (p `or` q ) inp = p inp ++ q inp Monadic version (p `or` q) inp = p inp `bindS` f where f m = q inp`bindS`g where g n = unitS(nub(m ++ n)) Monadic Object ‘Or’ S::a or b Input = “ab” a b Memo1 “ab” Memo2 Parsed out update Memo Check LR lookup Memo

Monadic-Memoization (cont.) s s a b a “aa” Production ‘a’ “updates” the memo table Production ‘a’ “lookups” the memo table =(["","a","aa"],[(“a",[("aa",["","a","aa"]),("a",["","a"]),(10,11)])]) LR Production keeps track of length and depth “aa” Memo table propagation is ensured correctly => O(n 3 )

References 1.Frost, R. A. and Launchbury, E. J. (1989) Constructing natural language interpreter in a lazy functional language. The computer Journal – Special edition on Lazy functional Programming, 32(2) Wadler, P. (1992) Monads for functional programming. Computer and systems sciences, Volume Hutton, G. (1992) Higher-order functions for parsing. Journal of Functional Programming, 2(3): , Cambridge University Press 4.Frost, R. A. (1992)Guarded attribute grammars: top down parsing and left recursive productions. SIGPLAN Notices 27(6): Lickman, P. (1995) Parsing With Fixed Points. Masters thesis. Oxford University 6.Frost, R.A., Szydlowski, B. (1996) Memoizing Purely Functional Top- Down Backtracking Language Processors. Sci. Comput. Program. 27(3): Hutton, G. (1998) Monadic parsing in Haskell. Journal of Functional Programming, 8(4): , Cambridge University Press 8.Frost, R.A.(2003) Monadic Memoization towards Correctness-Preserving Reduction of Search. Canadian Conference on AI 2003: 66-80