Adapting Discriminative Reranking to Grounded Language Learning Joohyun Kim and Raymond J. Mooney Department of Computer Science The University of Texas.

Slides:

Advertisements

Similar presentations

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.

Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Deep Learning in NLP Word representation and how to use it for Parsing

1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

Introduction to Machine Learning Approach Lecture 5.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

1 Learning to Interpret Natural Language Navigation Instructions from Observation Ray Mooney Department of Computer Science University of Texas at Austin.

1 Learning Natural Language from its Perceptual Context Ray Mooney Department of Computer Science University of Texas at Austin Joint work with David Chen.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Graphical models for part of speech tagging

Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.

David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context.

Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.

The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini

THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)

David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition The 50th Annual Meeting of the Association for Computational Linguistics (ACL)

Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.

PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.

Experimental Evaluation of Learning Algorithms Part 1.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

David L. Chen and Raymond J. Mooney Department of Computer Science The University of Texas at Austin Learning to Interpret Natural Language Navigation.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.

For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.

Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

John Lafferty Andrew McCallum Fernando Pereira

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.

Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

David Chen Supervising Professor: Raymond J. Mooney Doctoral Dissertation Proposal December 15, 2009 Learning Language from Perceptual Context 1.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Incremental Text Structuring with Hierarchical Ranking Erdong Chen Benjamin Snyder Regina Barzilay.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Grounded Language Learning

Neural Machine Translation

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Semantic Parsing for Question Answering

Joohyun Kim Supervising Professor: Raymond J. Mooney

--Mengxue Zhang, Qingyang Li

Integrating Learning of Dialog Strategies and Semantic Parsing

Unified Pragmatic Models for Generating and Following Instructions

Learning to Sportscast: A Test of Grounded Language Acquisition

Learning a Policy for Opportunistic Active Learning

Word embeddings (continued)

Chapter 10: Compilers and Language Translation

Presentation transcript:

Adapting Discriminative Reranking to Grounded Language Learning Joohyun Kim and Raymond J. Mooney Department of Computer Science The University of Texas at Austin The 51st Annual Meeting of the Association for Computational Linguistics August 5,

Discriminative Reranking Effective approach to improve performance of generative models with secondary discriminative model Applied to various NLP tasks – Syntactic parsing (Collins, ICML 2000; Collins, ACL 2002; Charniak & Johnson, ACL 2005) – Semantic parsing (Lu et al., EMNLP 2008; Ge and Mooney, ACL 2006) – Part-of-speech tagging (Collins, EMNLP 2002) – Semantic role labeling (Toutanova et al., ACL 2005) – Named entity recognition (Collins, ACL 2002) – Machine translation (Shen et al., NAACL 2004; Fraser and Marcu, ACL 2006) – Surface realization in language generation (White & Rajkumar, EMNLP 2009; Konstas & Lapata, ACL 2012) Goal: – Adapt discriminative reranking to grounded language learning 2

Discriminative Reranking Generative model – Trained model outputs the best result with max probability Trained Generative Model 1-best candidate with maximum probability Candidate 1 Testing Example 3

Discriminative Reranking Can we do better? – Secondary discriminative model picks the best out of n-best candidates from baseline model Trained Baseline Generative Model GEN … n-best candidates Candidate 1 Candidate 2 Candidate 3 Candidate 4 Candidate n … Testing Example Trained Secondary Discriminative Model Best prediction Output 4

Discriminative Reranking Training secondary discriminative model Trained Baseline Generative Model GEN … n-best training candidates Candidate 1 Candidate 2 Candidate 3 Candidate 4 Candidate n … probability Training Example 5

Discriminative Reranking Training secondary discriminative model – Discriminative model parameter is updated with comparison between the best predicated candidate and the gold standard Trained Baseline Generative Model GEN … n-best training candidates Candidate 1 Candidate 2 Candidate 3 Candidate 4 Candidate n … Training Example Secondary Discriminative Model Gold Standard Reference Best prediction Compare Update Train 6

Grounded Language Learning The process to acquire the semantics of natural language with respect to relevant perceptual contexts Supervision is ambiguous, appearing as surrounding perceptual environments – Not typical supervised learning task – One or some of the perceptual contexts are relevant – No single gold-standard per training example No Standard Discriminative Reranking Available! 7

Learn to interpret and follow navigation instructions – e.g. Go down this hall and make a right when you see an elevator to your left Use virtual worlds and instructor/follower data from MacMahon et al. (2006) No prior linguistic knowledge Infer language semantics by observing how humans follow instructions Navigation Task (Chen & Mooney, 2011) 8

H C L S S B C H E L E H – Hat Rack L – Lamp E – Easel S – Sofa B – Barstool C - Chair Sample Environment (Chen & Mooney, 2011) 9

Executing Test Instruction 10

Start 3 3 H H End Sample Navigation Instruction Instruction: Take your first left. Go all the way down until you hit a dead end.

3 3 H H 4 4 Observed primitive actions: Forward, Turn Left, Forward, Forward 12 Start End Encountering environments: back: BLUE HALLWAY front: BLUE HALLWAY left: CONCRETE HALLWAY right/back/front: YELLOW HALLWAY front/back: HATRACK right: CONCRETE HALLWAY front: WALL right/left: WALL Sample Navigation Instruction Instruction: Take your first left. Go all the way down until you hit a dead end.

3 3 H H 4 4 Observed primitive actions: Forward, Turn Left, Forward, Forward 13 Start End Encountering environments: back: BLUE HALLWAY front: BLUE HALLWAY left: CONCRETE HALLWAY right/back/front: YELLOW HALLWAY front/back: HATRACK right: CONCRETE HALLWAY front: WALL right/left: WALL Sample Navigation Instruction Instruction: Take your first left. Go all the way down until you hit a dead end.

3 3 H H 4 4 Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4. Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4. Walk forward once. Turn left. Walk forward twice. 14 Start End Sample Navigation Instruction

Task Objective Learn the underlying meanings of instructions by observing human actions for the instructions – Learn to map instructions (NL) into correct formal plan of actions (meaning representations, MR) Learn from high ambiguity – Training input of NL instruction / landmarks plan (Chen and Mooney, 2011) pairs – Landmarks plan  Describe actions in the environment along with notable objects encountered on the way  Overestimate the meaning of the instruction, including unnecessary details  Only subset of the plan is relevant for the instruction 15

Challenge 16 Instruction:"at the easel, go left and then take a right onto the blue path at the corner" Landmarks plan: Travel ( steps: 1 ), Verify ( at: EASEL, side: CONCRETE HALLWAY ), Turn ( LEFT ), Verify ( front: CONCRETE HALLWAY ), Travel ( steps: 1 ), Verify ( side: BLUE HALLWAY, front: WALL ), Turn ( RIGHT ), Verify ( back: WALL, front: BLUE HALLWAY, front: CHAIR, front: HATRACK, left: WALL, right: EASEL )

Challenge 17 Instruction:"at the easel, go left and then take a right onto the blue path at the corner" Landmarks plan: Travel Travel ( steps: 1 ), Verify at: EASEL Verify ( at: EASEL, side: CONCRETE HALLWAY ), Turn ( LEFT ) Turn ( LEFT ), Verify ( front: CONCRETE HALLWAY ), Travel Travel ( steps: 1 ), Verify front: WALL Verify ( side: BLUE HALLWAY, front: WALL ), Turn ( RIGHT ) Turn ( RIGHT ), Verify front: BLUE HALLWAY Verify ( back: WALL, front: BLUE HALLWAY, front: CHAIR, front: HATRACK, left: WALL, right: EASEL )

Challenge 18 Instruction:"at the easel, go left and then take a right onto the blue path at the corner" Correct plan: Travel Travel ( steps: 1 ), Verify at: EASEL Verify ( at: EASEL, side: CONCRETE HALLWAY ), Turn ( LEFT ) Turn ( LEFT ), Verify ( front: CONCRETE HALLWAY ), Travel Travel ( steps: 1 ), Verify front: WALL Verify ( side: BLUE HALLWAY, front: WALL ), Turn ( RIGHT ) Turn ( RIGHT ), Verify front: BLUE HALLWAY Verify ( back: WALL, front: BLUE HALLWAY, front: CHAIR, front: HATRACK, left: WALL, right: EASEL ) Exponential Number of Possibilities!  Combinatorial matching problem between instruction and landmarks plan

Baseline Generative Model PCFG Induction Model for Grounded Language Learning (Kim & Mooney, EMNLP 2012) – Transform grounded language learning into standard PCFG grammar induction task – Set of pre-defined PCFG conversion rules  Probabilistic relationship of formal meaning representations (MRs) and natural language phrases (NLs) – Use semantic lexicon  Help define generative process of larger semantic concepts (MRs) hierarchically generating smaller concepts and finally NL phrases 19

Turn Turn LEFT LEFT at: SOFA at: SOFA Travel Verify Turngo Generative Process NL: Context MR Relevant Lexemes Turn LEFT front: BLUE HALL front: BLUE HALL front: EASEL front: EASEL steps: 2 steps: 2 left: HATRACK left: HATRACK Verify Travel Verify Turn RIGHT at: SOFA at: SOFA Verify at: CHAIR L1L1 L2L2 andlefttothesofa 20

How can we apply discriminative reranking? Impossible to apply standard discriminative reranking to grounded language learning – Lack of a single gold-standard reference for each training example – Instead, provides weak supervision of surrounding perceptual context (landmarks plan) Use response feedback from perceptual world – Evaluate candidate formal meaning representations (MRs) by executing them in simulated worlds  Used in evaluating the final end-task, plan execution – Weak indication of whether a candidate is good/bad – Multiple candidate parses for parameter update  Response signal is weak and distributed over all candidates 21

Reranking Model: Averaged Perceptron (Collins, ICML 2000) Parameter weight vector is updated when trained model predicts a wrong candidate Trained Baseline Generative Model GEN … n-best candidates Candidate 1 Candidate 2 Candidate 3 Candidate 4 Candidate n … Training Example Gold Standard Reference Best prediction Update feature vector

Reranking Model: Averaged Perceptron (Collins, ICML 2000) Our baseline model with navigation task – Candidates: parse trees from baseline model Trained Baseline Generative Model GEN … n-best candidates … Training Example Best prediction Update feature vector Kim & Mooney,

Response-based Weight Update A single gold-standard reference parse for each training example does not exist Pick a pseudo-gold parse out of all candidates – Evaluate composed MR plans from candidate parses – MARCO (MacMahon et al. AAAI 2006) execution module runs and evaluates each candidate MR in the world  Also used for evaluating end-goal, plan execution performance – Record Execution Success Rate  Whether each candidate MR reaches the intended destination  MARCO is nondeterministic, average over 10 trials – Prefer the candidate with the best success rate during training 24

Response-based Update Select pseudo-gold reference based on MARCO execution results n-best candidates Candidate 1 Candidate 2 Candidate 3 Candidate 4 Candidate n … Pseudo-gold Reference Best prediction Update Derived MRs Feature vector difference MARCO Execution Module Execution Success Rate

Weight Update with Multiple Parses Candidates other than pseudo-gold could be useful – Multiple parses may have same max execution rates – Low execution rates could also mean correct plan given indirect supervision of human follower actions  MR plans are underspecified or ignorable details attached  Sometimes inaccurate, but contain correct MR components to reach the desired goal Weight update with multiple candidate parses – Use candidates with higher execution rates than currently best-predicted candidate – Update with feature difference is weighted with difference between execution rates 26

Weight Update with Multiple Parses Weight update with multiple candidates that have higher execution rate than currently predicted parse n-best candidates Candidate 1 Candidate 2 Candidate 3 Candidate 4 Candidate n … Best prediction Update (1) Derived MRs MARCO Execution Module Execution Success Rate

Weight Update with Multiple Parses Weight update with multiple candidates that have higher execution rate than currently predicted parse n-best candidates Candidate 1 Candidate 2 Candidate 3 Candidate 4 Candidate n … Best prediction Update (2) Derived MRs MARCO Execution Module Execution Success Rate

Features Binary indicator whether a certain composition of nonterminals appear in parse tree (Collins, EMNLP 2002, Lu et al., EMNLP 2008, Ge & Mooney, ACL 2006) Turn left andfind the sofathen turn around the corner L 1 : Turn(LEFT), Verify(front:SOFA, back:EASEL), Travel(steps:2), Verify(at:SOFA), Turn(RIGHT) L 2 : Turn(LEFT), Verify(front:SOFA) L 3 : Travel(steps:2), Verify(at:SOFA), Turn(RIGHT) L 4 : Turn(LEFT) L 5 : Travel(), Verify(at:SOFA) L 6 : Turn() 29

Data 3 maps, 6 instructors, 1-15 followers/direction Segmented into single sentence steps to make the learning easier (Chen & Mooney, 2011) Align each single sentence instruction with landmarks plan Use single-sentence version for training, both paragraph and single- sentence for testing 30 Take the wood path towards the easel. At the easel, go left and then take a right on the the blue path at the corner. Follow the blue path towards the chair and at the chair, take a right towards the stool. When you reach the stool, you are at 7. Paragraph Single sentence Take the wood path towards the easel. At the easel, go left and then take a right on the the blue path at the corner. Turn, Forward, Turn left, Forward, Turn right, Forward x 3, Turn right, Forward Turn Forward, Turn left, Forward, Turn right

Evaluations Leave-one-map-out approach – 2 maps for training and 1 map for testing – Parse accuracy  Evaluate how good the derived MR is from parsing novel sentences in test data  Use partial parse accuracy as metric – Plan execution accuracy (end goal)  Test how well the formal MR plan output reaches the destination  Only successful if the final position matches exactly Compared with Kim & Mooney, 2012 (Baseline) – All reranking results use 50-best parses – Try to get 50-best distinct composed MR plans and according parses out of 1,000,000-best parses  Many parse trees differ insignificantly, leading to same derived MR plans  Generate sufficiently large 1,000,000-best parse trees from baseline model 31

Response-based Update vs. Baseline Parse AccuracyPlan Execution F1SingleParagraph Baseline Gold-Standard Response vs. Baseline – Response-based approach performs better in the final end-task, plan execution. – Optimize the model against plan execution 32

Response-based vs. Gold-Standard Update Parse AccuracyPlan Execution F1SingleParagraph Baseline Gold-Standard Response Gold-Standard Update – Gold standard data available only for evaluation purpose – Grounded language learning does not support vs. Gold-Standard Update – Gold-Standard is better in parse accuracy – Response-based approach is better in plan execution – Gold-Standard misses some critical MR elements for reaching the goal. Reranking is possible even when gold-standard reference does not exist for training data – Use responses from perceptual environments instead (end-task related) 33

Response-based Update with Multiple vs. Single Parses Using multiple parses is better than using a single parse. – Single-best pseudo-gold parse provides only weak feedback – Candidates with low execution rates mostly produce underspecified plans or plans with ignorable details, but capturing gist of preferred actions – A variety of preferable parses help improve the amount and the quality of weak feedback for better model Parse AccuracyPlan Execution F1SingleParagraph Single Multi

Conclusion Adapting discriminative reranking to grounded language learning – Lack of a single gold-standard parse during training – Using response-based feedback can be alternative  Provided by natural responses from the perceptual world – Weak supervision of response feedback can be improved using multiple preferable parses 35

Thank you for your time! Questions? 36