1 Lazy Combinators for Executable Specifications of General Attribute Grammars Rahmatullah Hafiz and Richard A. Frost

Slides:



Advertisements
Similar presentations
Semantics Static semantics Dynamic semantics attribute grammars
Advertisements

Attribute Grammars Prabhaker Mateti ACK: Assembled from many sources.
Cs7120 (Prasad)L21-DCG1 Definite Clause Grammars
CS7100 (Prasad)L16-7AG1 Attribute Grammars Attribute Grammar is a Framework for specifying semantics and enables Modular specification.
Chapter 5 Syntax Directed Translation. Outline Syntax Directed Definitions Evaluation Orders of SDD’s Applications of Syntax Directed Translation Syntax.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
ICE1341 Programming Languages Spring 2005 Lecture #5 Lecture #5 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
Parsing III (Eliminating left recursion, recursive descent parsing)
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
Chapter 2 A Simple Compiler
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
CS784 (Prasad)L167AG1 Attribute Grammars Attribute Grammar is a Framework for specifying semantics and enables Modular specification.
EECS 6083 Intro to Parsing Context Free Grammars
Abstract Syntax Trees Lecture 14 Wed, Mar 3, 2004.
Invitation to Computer Science 5th Edition
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Chapter 5 Syntax-Directed Translation Section 0 Approaches to implement Syntax-Directed Translation 1、Basic idea Guided by context-free grammar (Translating.
Semantic Analysis Legality checks –Check that program obey all rules of the language that are not described by a context-free grammar Disambiguation –Name.
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
Topic #2: Infix to Postfix EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
CS 363 Comparative Programming Languages Semantics.
Overview of Previous Lesson(s) Over View  An ambiguous grammar which fails to be LR and thus is not in any of the classes of grammars i.e SLR, LALR.
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 3 Part II Describing Syntax and Semantics.
Copyright © 2006 Addison-Wesley. All rights reserved. Ambiguity in Grammars A grammar is ambiguous if and only if it generates a sentential form that has.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Overview of Previous Lesson(s) Over View  In syntax-directed translation 1 st we construct a parse tree or a syntax tree then compute the values of.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
A Monadic-Memoized Solution for Left-Recursion Problem of Combinatory Parser Rahmatullah Hafiz Fall, 2005.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
The Interpreter Pattern (Behavioral) ©SoftMoore ConsultingSlide 1.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Top-Down Parsing.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Chap. 7, Syntax-Directed Compilation J. H. Wang Nov. 24, 2015.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
PZ03CX Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03CX - Language semantics Programming Language Design.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Introduction to Parsing
Comp 411 Principles of Programming Languages Lecture 3 Parsing
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compiler Construction
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Intermediate Code Generation
Chapter 10: Compilers and Language Translation
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
COMPILER CONSTRUCTION
Presentation transcript:

1 Lazy Combinators for Executable Specifications of General Attribute Grammars Rahmatullah Hafiz and Richard A. Frost and Department of Computer Science University of Windsor, Ontario, Canada at PADL’2010 Madrid, Spain

2 The Problem − Knowledge of parsing techniques and semantic evaluation methods are required in order for software developers to create natural language (NL) processors, thereby restricting the creation of NL interfaces to many applications. − One solution is to allow language processors to be created as executable specifications of grammars annotated with semantic rules. The most modular approach is to use top-down parsing.

3 The Problem (cont.) However, naïve algorithms for top-down parsing: 1.Do not non-terminate for context-free syntax that includes direct/ indirect left-recursion − addressed by Frost, Hafiz and Callaghan (PADL 2008) 2.Require exponential time and space for ambiguous grammars e.g., grammars for NL 3.Do not provide support for general attribute relationships (including the use, by a parser for construct p, of inherited attributes associated with syntactic constructs “to the right” of p on the right hand side of syntactic rules. 4.2 and 3 are the primary focus of this paper

4 Importance of the Problem − Accommodating ambiguity is essential as natural languages have ambiguous grammars. − Transforming a left-recursive grammar to a weakly equivalent non-left-recursive form can introduce loss of parses and difficulty in specifying semantics. − Declarative semantics with arbitrary attribute dependencies provide unrestricted accommodation of NL semantics. For example, Montague style compositional semantics.

5 Importance of the Problem (cont.) − Language developers can build executable specifications of language processors without worrying about syntax-semantics evaluation methods and order. − Modularity allows individual parts of the specifications to be constructed and tested separately. − Polynomial time and space are required for parsing highly ambiguous languages, such as NL, in real time.

6 An example of a general CFG start (S 0 ) ::= tree(T 0 ) tree(T 0 ) ::= tree(T 1 ) tree(T 2 ) num(N 1 ) | num(N 2 ) num(N 0 ) ::= 1 | 2 …… ___tree___ / | \ tree tree num / / | \ \ num tree tree num 2 | | | \ 1 num num 3 | | 4 5 ___tree___ / | \ tree tree num / | \ \ \ tree tree num num 2 | | | | num num 5 3 | | 1 4 This left-recursive and ambiguous context-free syntax generates two ambiguous trees when applied to the input

7 An example problem ___tree___ / | \ tree tree num / / | \ \ num tree tree num 5 | | | \ 5 num num 5 | | 5 ___tree___ / | \ tree tree num / | \ \ \ tree tree num num 5 | | | | num num 5 5 | | 5 Suppose that we want to replace all leaves of the trees with the max value of the input, giving: One approach is to specify this problem as a language processing problem, and to define the solution using an executable attribute grammar.

8 An executable specification start (S 0 ) ::= tree(T 0 ) { RepVal.T 0 ↓ = MaxVal.T 0 ↑ } tree(T 0 ) ::= tree(T 1 ) tree(T 2 ) num(N 1 ) { MaxVal.T 0 ↑ = max (MaxVal.T 1 ↑, MaxVal.T 2 ↑, MaxVal.N 1 ↑), RepVal.T 1 ↓ = RepVal.T 0 ↓, RepVal.T 2 ↓ = RepVal.T 0 ↓, RepVal.N 1 ↓ = RepVal.T 0 ↓ } | num(N 2 ) num(N 0 ) ::= 1{MaxVal.N 0 ↑= 1}|2{MaxVal.N 0 ↑=2}.. ↓ = attributes propagating downwards i.e. inherited attributes ↑ = attributes propagating upwards i.e. synthesized attributes Our objective is to have a program that is isomorphic with this grammar, i.e. an executable specification.

9 Our Solution − We have constructed the first top-down parsing algorithm that supports executable specifications of fully general CFGs annotated with fully-general declarative semantic rules in polynomial time and space w.r.t the length of input. − We have implemented the algorithm as a set of non-strict, purely functional combinators, i.e. higher-order functions: *> and for sequencing and alternating rules rule_i and rule_s for synthesized and inherited rules parser, nt for complete AG rules memoize for converting parsers to memoized versions.

10 Solution (cont.) Our parser combinators are functions that map the current start position to a set of end positions, where each end is mapped to a set of syntax trees. For example: data Atts = MaxVal {getAVAL :: Int} | Binary_OP {getB_OP :: (Int -> Int -> Int)}... type Start/End = (Int,[(Instance, [Atts])]) data PTree v = Leaf (v,Instance) | Branch [PTree v] | SubNode ((Label, Instance), (Start,End))

11 Solution (cont.) Memoization allows use of previously-computed results: − The memo-table is threaded through parser executions using a state monad type MemoTable = [(Label,[(Start, (Context,Result))])] type Result = [((Start, End),[PTree Label])] − lookup is performed, if no previous application, a new result is constructed and the memo-table is updated. − groups local syntactic ambiguities and common semantic values under the current position in a newly-formed result. − parsers pass up a reference/pointer of their memo-table entry to upper-level parsers, instead of the complete result.

12 Solution: General Syntax The basic syntax analysis technique is based on Frost, Hafiz and Callaghan (2008): − To accommodate direct left-recursion, a “left-recursive count” is used for the number of times a parser has been applied to an input position j. This count is increased on recursive descent, and the parser is curtailed whenever the “left-recursive count” of parser at j exceeds the number of remaining input tokens. − To accommodate indirect left-recursion, a parser's result is paired with a set of curtailed non-terminals at j within the current parse path, which is used to determine the context in which the result was constructed at j – if the new context strictly subsumes the context for a saved result, then a new result is computed, otherwise the saved result is used.

13 We have now extended the set of combinators to accommodate fully general semantics: − Pure, non-strict, combinators simplified the accommodation of arbitrary dependencies including “inheritance from the right” − Expressions are formed from an environment of potentially unevaluated attributes returned by the current parser, its predecessor, successors or siblings Solution: General Semantics

14 − Synthesized attributes are associated with parent nodes, and inherited attributes are associated with their child nodes − To maintain the flow of attributes, in addition to being executed on the current Start and Context, a parser must pass down its unique id and a list of its own inherited attributes so that they can be used when executing the succeeding parsers' semantic definitions. Solution: General Semantics (cont.) pmpm pnpn popo p m ’s inh & syn atts p n ’s inh & syn atts p o ’s inh & syn atts own inh atts own syn atts For the parent node p i pipi pmpm pnpn popo p m ’s inh & syn atts p i ’s inh & syn atts p o ’s inh & syn atts own syn & inh atts For a child node p n pipi

15 Haskell Combinators Our combinators are implemented in Haskell. A parser's alternative rules are formed with the combinator. −Accommodates alternative syntax with a list of semantic rules. −Alternatives p and q are applied to the current position and the current context. The id and inherited attributes of the calling parser are passed down to the parsers for p and q. All results from p are passed to q (thereby avoiding duplication of work) and then their results merged together. ( ) :: NTType -> NTType -> NTType type M a = Start -> Context -> StateMonad a type ParseResult = (Context, Result) type NTType = Id -> InsAttVals -> M ParseResult

16 Haskell Combinators (cont.) Parsers are sequenced with the combinator *> −In p *> q, p is first applied to the current start position and the current context. Then *> causes p to compute its inherited attribute from the surrounding environment. −Next, q is applied to the set of ends returned by p. q also computes inherited attributes from the calling and p’s environment. (*>) :: SeqType -> SeqType -> SeqType, where : type SemRule = (Instance,(InsAttVals, Id) -> InsAttVals) type SeqType = Id -> InsAttVals -> [SemRule] -> Result -> M ParseResult

17 Attribute computation rules are formed with a higher-order wrapper function parser, − maps the current parser's synthesized rules to all ending points of the syntactic result. − causes each parser in the syntax rule to pass down their inherited attributes for future use. parser :: SeqType -> [SemRule] -> Id -> InsAttVals -> M ParseResult The higher-order function nt causes parsers to pass down their own identification and a list of inherited attributes − by applying the grouped expressions on a parser -provided environment that consists of the predecessor id's and surrounding parsers' synthesized and inherited attributes. nt :: NTType -> Id -> SeqType Arbitrary Dependencies in Semantics (cont.)

18 Our Example Executable Specification in Haskell The Executable representation of the example attribute grammar specification of slide #8

19 The Result of Executing the Specification

20 Time Complexity w.r.t #input For non-left recursive grammars, parsing complexity w.r.t. length of the input, n, is O(n 3 ), and for left recursive grammars, the complexity is O(n 4 ) When semantic evaluation is taken into account, the time complexity may increase depending on the complexities of the semantic actions.

21 Space Complexity w.r.t #input n −Our compact representation of syntax trees allows the parser to save results as a list of one- level-depth branches with attribute values attached to pointing sub-nodes. −In the memo-table, for each parser's n input positions, we can store n branches corresponding to n end positions. For a branch p *> q, there are n possible ambiguities. −Hence, we need O(n 3 ) space in the worst-case w.r.t. the length of the input.

22 An example application A simple domain-specific NL interface: − The Attribute Grammar is a fully-general CFG with 15 non- terminals and 32 AG rules. − All syntax rules are associated with semantic rules which implement a subset of the set-theoretic version of Montague semantics extracted from Frost and Fortier (2007). − We define a dictionary/ knowledge-base for syntactic categories and their meanings: 15 syntactic categories and 130 words.

23 An example application dictionary = ("moons", Cnoun, [NOUNCLA_VAL set_of_moon]), ("atmospheric", Adj, [ADJ_VAL set_of_atmospheric]), ("exist", Intransvb, [VERBPH_VAL set_of_things]), ("deimos", Pnoun, [TERMPH_VAL (test_wrt 20)]), ("person", Cnoun, meaning_of nouncla "man or woman"), ("discoverer”, Cnoun, meaning_of nouncla "person who discovered something"),

24 An example application(cont.) Example executable specifications of production rules: jointermph = memoize Jointermph parser (nt jointermph S1 *> nt termphjoin S2 *> nt jointermph S3) [rule_s TERMPH_VAL OF LHS ISEQUALTO apply_join [synthesized TERMPH_VAL OF S1, synthesized TERMPHJOIN_VAL OF S2, synthesized TERMPH_VAL OF S3 ]] parser (nt termph S4) [rule_s TERMPH_VAL OF LHS ISEQUALTO copy [synthesized TERMPH_VAL OF S4]]

25 An example application (cont.) Answers hundreds of thousands of questions, e.g. which moons that were discovered by hall orbit mars  “phobos and deimos” every planet is orbited by a moon  “false” how many moons were discovered by hall or kuiper  “four” did hall discover deimos or phobos and miranda  “no and yes” ** note: ambiguous results

26 Related Work − Kuno’s (1965) “depth-imposing” algorithm, Nederhof and Koster’s “cancellation- parsing” for DCG’s (1993), Lickman’s use of “fixed-points” (1995), and Johnson’s integration of memoization with continuation- passing-style (1995), all terminate for left-recursion but have exponential complexity. − Norvig (1991) and Frost’s (1994) techniques for automatic memoization in parsing. − Hutton and Meijer (1995) Monadic parser combinators. − Swierstra et al. (1991 and 1998) and De Moor et al. (2000) “first-class attributes”. JustAdd (Ekman, 2006) is an compiler-compiler AG system for Java. − Frost (2002) – an Attribute Grammar programming environment. Also, YAG (Mcroy et al., 2003) in which AGs are used to correct partially- specified input.

27 Comparison with Related Work – Our approach is directly executable and specifications do not need to be compiled – Our top-down syntax-driven parsing strategy strictly preserves syntactic structures of general ambiguous CFGs – Our approach accommodates fully declarative semantic with arbitrary dependencies for general syntax following original def. of AG – Our memoization is specialized to perform extra tasks e.g. keeping track of non-terminals’ context information, merging syntactic ambiguity, mapping and grouping attributes etc. – Our approach differs from other NL processors by being a one-pass parsing system that can return either compactly-represented parse trees annotated with attribute values or just a set of final answers, according to what is demanded by the application.

28 Future Work − Construct formal correctness proofs, and optimize the implementation w.r.t. grammar size. − Use semantic rules to prune out non-sensible parses. − Model NL features that can be characterized by other grammar formalisms. Project Website X-SAIGA – Executable SpecifIcations of Grammars cs.uwindsor.ca/~hafiz/proHome.html A version of demo code can be found at: