Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.

Slides:



Advertisements
Similar presentations
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Advertisements

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
ICE1341 Programming Languages Spring 2005 Lecture #5 Lecture #5 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
ISBN Chapter 3 Describing Syntax and Semantics.
CS5371 Theory of Computation
C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Defining practical programming languages Carlos Varela RPI.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 Introduction: syntax and semantics Syntax: a formal description of the structure of programs in a given language. Semantics: a formal description of.
Chapter 3: Formal Translation Models
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
Introduction Syntax: form of a sentence (is it valid) Semantics: meaning of a sentence Valid: the frog writes neatly Invalid: swims quickly mathematics.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Syntax: 10/18/2015IT 3271 Semantics: Describe the structures of programs Describe the meaning of programs Programming Languages (formal languages) -- How.
Grammars CPSC 5135.
PART I: overview material
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.
November 2011CLINT-LN CFG1 Computational Linguistics Introduction Context Free Grammars.
Daisy Arias Math 382/Lab November 16, 2010 Fall 2010.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Introduction to Compiling
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
The Interpreter Pattern (Behavioral) ©SoftMoore ConsultingSlide 1.
 Fall Chart 2  Translators and Compilers  Textbook o Programming Language Processors in Java, Authors: David A. Watts & Deryck F. Brown, 2000,
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
NATURAL LANGUAGE PROCESSING
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
A Kernel-based Approach to Learning Semantic Parsers
Lexical and Syntax Analysis
Semantic Parsing for Question Answering
Using String-Kernels for Learning Semantic Parsers
Learning to Transform Natural to Formal Languages
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
David Kauchak CS159 – Spring 2019
COMPILER CONSTRUCTION
Presentation transcript:

Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney

May 13th, Overview Background SILT CL ANG and G EOQUERY Semantic Parsing using Transformation rules String-based learning Tree-based learning Experiments Future work Conclusion

May 13th, Natural Language Processing (NLP) Natural Language—human language. English The reason to process NL: To provide a much user-friendly interface Problems: NL is too complex. NL has many ambiguities. Until now, NL cannot be used to program a computer.

May 13th, Classification of Language Traditionally classification (Chomsky Hierarchy) Regular grammar Context-free grammar—Formal Language Context-sensitive grammar Unrestricted grammar—Natural Language All programming languages are less flexible than context-sensitive languages currently. For example, C++ is a restricted context-sensitive language.

May 13th, An Approach to process NL Map a natural language to a formal query or command language. Therefore, NL interfaces to complex computing and AI systems can be more easily developed. English Formal Language Map Compiler Interpreter

May 13th, Grammar Terms Grammar: G = (N, T, S, P) N: finite set of Non-terminal symbols T: finite set of Terminal symbols S: Starting non-terminal symbol, S ∈ N P: finite set of productions Production: x->y For example, Noun -> “computer” AssignmentStatement -> i := 10; Statements -> Statement; Statements

May 13th, SILT SILT—Semantic Interpretation by Learning Transformations Transformation rules Map substrings in NL sentences or subtrees in their corresponding syntactic parse trees to subtrees of the formal-language parse tree. SILT learns transformation rules from training data—pairs of NL sentences and manual translated formal language statements. Two target formal languages: CL ANG G EOQUERY

May 13th, CL ANG A formal language used in coaching robotic soccer in the RoboCup Coach Competition. C LANG grammar consists of 37 non-terminals and 133 productions. All tactics and behaviors are expressed in terms of if-then rules An example: ( (bpos (penalty-area our) ) (do (player-except our {4} ) (pos (half our) ) ) ) “If the ball is in our penalty area, all our players except player 4 should stay in our half.”

May 13th, G EOQUERY A database query language for a small database of U.S. geography. The database contains about 800 facts. Based on Prolog with meta-predicates augmentations. An example: answer(A, count(B, (city(B), loc(B, C), const(C, countryid(usa) ) ),A) ) “How many cities are there in the US?”

May 13th, Two methods String-based transformation learning Directly maps strings of the NL sentences to the parse tree of formal languages Tree-based transformation learning Maps subtrees to subtrees between two languages. Assumes the syntactic parse tree and parser of the NL sentences are provided

May 13th, Semantic Parsing Pattern matching Patterns found in NL Templates based on productions NL phrases Formal expression Rule representation for two methods “TEAM UNUM has the ball” CONDITION →(bowner TEAM {UNUM}) S NP TEAM UNUM VP VBZ has NP DT the NN ball

May 13th, Examples of Parsing 1. “If our player 4 has the ball, our player 4 should shoot.” 2. “If TEAM UNUM has the ball, TEAM UNUM should ACTION.” our 4 our 4 (shoot) 3. “If CONDITION, TEAM UNUM should ACTION.” (bowner our {4}) our 4 (shoot) 4. “If CONDITION,DIRECTIVE.” (bowner our {4}) (do our {4} (shoot) ) 5. RULE ( (bowner our {4}) (do our {4} (shoot) ))

May 13th, Variations of Rule Representation SILT allows patterns to skip some words or nodes “if CONDITION, DIRECTIVE.” -> ”then” To deal with non-compositionality SILT allows to apply constrains “in REGION” matches “CONDITION -> (bpos REGION)” if “in REGION” follows “the ball ”. SILT allows to use templates with multi productions “TEAM player UNUM has the ball in REGION” CONDITION → (and (bowner TEAM UNUM) (bpos REGION))

May 13th, Learning Transformation Rules Input: A training set T of NL sentences paired with formal representations; a set of productions in the formal grammar Output: A learned rule base L Algorithm: Parse all formal representations in T using. Collect positive P and negative examples N for all ∈. L = ∅ Until all positive examples are covered, or no more good rules can be found for any ∈, do: R’ = FindeBestRules(,P,N) L = L ∪ R’ Apply rules in L to sentences in T. Given a NL sentence S: P: if is used in the formal expression of S, then S is positive to N: if is not used in the formal expression of S, then S is negative to

May 13th, Issues of SILT Learning Non-compositionality Rule cooperation Rules are learn in order. Therefore an over-general ancestor will lead to a group of over-general child rules. Further, no rule can cooperate with that kind of rules. Two approaches can solve: 1. Find the single best rule for all competing productions in each iteration. 2. Over generate rules; then find a subset which can cooperate

May 13th, FindBestRule() For String-based Learning Input: A set of productions in the formal grammar; sets of positive P and negative examples N for each in Output: The best rule BR Algorithm: R = ∅ For each production π ∈ Π : Let R π be the maximally-specific rules derived from P. Repeat for k = 1000 times: Choose r1, r2 ∈ R π at random. g = GENERALIZE(r1, r2, π) Add g to R. R = R ∪ R BR = argmax r ∈ R goodness(r) Remove positive examples covered by BR from P.

May 13th, FindBestRule() Cont. Goodness (r) GENERALIZE r1, r2 : two transformation rules based on the same production For example: π : Region -> (penalty-area TEAM) pattern 1: TEAM ‘s penalty box pattern 2: TEAM penalty area Generalization: TEAM penalty

May 13th, Tree-based Learning Similar FindBestRules() algorithm GENERALIZE Find the largest common subgraphs of two rules. For example: π : Region -> (penalty-area TEAM) Pattern 1 Pattern 2 Generalization NN penalty NP TEAM POS ‘s NN box PRP$ TEAM NN area NP NN penalty NP, TEAM TEAM NN penalty NN

May 13th, Experiment As for CL ANG 300 pieces selected randomly from log files of 2003 RoboCup Coach Competition. Each formal instruction was translated into English by human. Average length of a NL sentence is words. As for G EOQUERY 250 questions were collected from undergraduate students. All English queries were translated manually. Average length of a NL sentence is 6.87 words.

May 13th, Result for CL ANG

May 13th, Result for CL ANG (Cont.)

May 13th, Result for G EOQUERY

May 13th, Result for G EOQUERY (Cont.)

May 13th, Time Consuming Time consuming in minutes.

May 13th, Future Work Though improved, SILT still lacks robustness of statistical parsing. The hard-matching symbolic rules of SILT are sometimes too brittle. A more unified implementation of tree-based SILT which allows to directly compare and evaluate the benefit of using initial syntactic parsers.

May 13th, Conclusion A novel approach, SILT, can learn transformation rules that maps NL sentences into a formal language. It shows better overall performance than previous approaches. NLP, still a long way to go.

May 13th, Thank you! Questions or comments?