1 Learning to Interpret Natural Language Navigation Instructions from Observation Ray Mooney Department of Computer Science University of Texas at Austin.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Adapting Discriminative Reranking to Grounded Language Learning Joohyun Kim and Raymond J. Mooney Department of Computer Science The University of Texas.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
1 Learning Natural Language from its Perceptual Context Ray Mooney Department of Computer Science University of Texas at Austin Joint work with David Chen.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition The 50th Annual Meeting of the Association for Computational Linguistics (ACL)
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
An Extended GHKM Algorithm for Inducing λ-SCFG Peng Li Tsinghua University.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Overview of Previous Lesson(s) Over View  An ambiguous grammar which fails to be LR and thus is not in any of the classes of grammars i.e SLR, LALR.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
David L. Chen and Raymond J. Mooney Department of Computer Science The University of Texas at Austin Learning to Interpret Natural Language Navigation.
1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Artificial Intelligence: Natural Language
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
Supertagging CMSC Natural Language Processing January 31, 2006.
John Lafferty Andrew McCallum Fernando Pereira
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
David Chen Supervising Professor: Raymond J. Mooney Doctoral Dissertation Proposal December 15, 2009 Learning Language from Perceptual Context 1.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
NATURAL LANGUAGE PROCESSING
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
Grounded Language Learning
Semantic Parsing for Question Answering
Joohyun Kim Supervising Professor: Raymond J. Mooney
Integrating Learning of Dialog Strategies and Semantic Parsing
Unified Pragmatic Models for Generating and Following Instructions
Learning to Sportscast: A Test of Grounded Language Acquisition
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Using Natural Language Processing to Aid Computer Vision
Parsing Unrestricted Text
Chapter 10: Compilers and Language Translation
Presentation transcript:

1 Learning to Interpret Natural Language Navigation Instructions from Observation Ray Mooney Department of Computer Science University of Texas at Austin Joint work with David Chen Joohyun Kim Lu Guo

2 Challenge Problem: Learning to Follow Directions in a Virtual World Learn to interpret navigation instructions in a virtual environment by simply observing humans giving and following such directions (Chen & Mooney, AAAI-11). Eventual goal: Virtual agents in video games and educational software that automatically learn to take and give instructions in natural language.

H C L S S B C H E L E Sample Environment (MacMahon, et al. AAAI-06) H – Hat Rack L – Lamp E – Easel S – Sofa B – Barstool C - Chair 3

Sample Instructions Take your first left. Go all the way down until you hit a dead end. Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4. Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4. Walk forward once. Turn left. Walk forward twice. Start 3 3 H H End

Sample Instructions 3 3 H H 4 4 Take your first left. Go all the way down until you hit a dead end. Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4. Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4. Walk forward once. Turn left. Walk forward twice. Observed primitive actions: Forward, Left, Forward, Forward 5 Start End

Observed Training Instance in Chinese

Executing Test Instance in English

Formal Problem Definition Given: { (e 1, w 1, a 1 ), (e 2, w 2, a 2 ), …, (e n, w n, a n ) } e i – A natural language instruction w i – A world state a i – An observed action sequence Goal: Build a system that produces the correct a j given a previously unseen (e j, w j ).

Observation Instruction World State Training Action Trace

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Training Action Trace Navigation Plan Constructor

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Training Action Trace Navigation Plan Constructor Semantic Parser Learner

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Instruction World State Training Testing Action Trace Navigation Plan Constructor Semantic Parser Learner

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Instruction World State Training Testing Action Trace Navigation Plan Constructor Semantic Parser Learner Semantic Parser

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Execution Module (MARCO) Instruction World State Training Testing Action Trace Navigation Plan Constructor Semantic Parser Learner Semantic Parser Action Trace

Representing Linguistic Context Context is represented by the sequence of observed actions each followed by verifying all observable aspects of the resulting world state. Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL 39

Possible Plans An instruction can refer to a combinatorial number of possible plans, each composed of some subset of this full contextual description. Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL 39

Possible Plan # 1 Turn and walk to the couch Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL 40

Possible Plan # 2 Face the blue hall and walk 2 steps Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL 41

Possible Plan # 3 Turn left. Walk forward twice. Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL 42

Disambiguating Sentence Meaning Too many meanings to tractably enumerate them all. Therefore, cannot use EM to align sentences with enumerated meanings and thereby disambiguate the training data. 43

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Execution Module (MARCO) Instruction World State Training Testing Action Trace Navigation Plan Constructor Semantic Parser Learner Semantic Parser Action Trace

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Execution Module (MARCO) Instruction World State Training Testing Action Trace Context Extractor Semantic Parser Learner Semantic Parser Action Trace

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Execution Module (MARCO) Instruction World State Training Testing Action Trace Context Extractor Semantic Parser Learner Semantic Parser Action Trace Lexicon Learner

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Execution Module (MARCO) Instruction World State Training Testing Action Trace Context Extractor Semantic Parser Learner Semantic Parser Action Trace Lexicon Learner Plan Refinement

Lexicon Learning Learn meanings of words and short phrases by finding correlations with meaning fragments. 43 Verify Travel Turn steps: 2 front: BLUE HALL front: BLUE HALL face blue hall2 steps walk

Lexicon Learning Algorithm To learn the meaning of the word/short phrase w: 1.Collect all landmark plans that co-occur with w and add them to the set PosMean(w) 2.Repeatedly take intersections of all possible pairs of members of PosMean(w) and add any new entries, g, to PosMean(w). 3.Rank the entries by the scoring function:

Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL Verify Travel Turn Verify front: BLUE HALL front: BLUE HALL steps: 1 at: SOFA LEFT Graph 1: “Turn and walk to the sofa.” Graph 2: “Walk to the sofa and turn left.” Graph Intersection

Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL Verify Travel Turn Verify front: BLUE HALL front: BLUE HALL steps: 1 at: SOFA LEFT Verify Turn LEFT front: BLUE HALL front: BLUE HALL Intersections: Graph Intersection Graph 1: “Turn and walk to the sofa.” Graph 2: “Walk to the sofa and turn left.”

Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL Verify Travel Turn Verify front: BLUE HALL front: BLUE HALL steps: 1 at: SOFA LEFT Verify Turn LEFT front: BLUE HALL front: BLUE HALL Travel Verify at: SOFA Intersections: Graph Intersection Graph 1: “Turn and walk to the sofa.” Graph 2: “Walk to the sofa and turn left.”

Plan Refinement Use learned lexicon to determine subset of context representing sentence meaning. 43 Face the blue hall and walk 2 steps Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL

Plan Refinement Use learned lexicon to determine subset of context representing sentence meaning. 43 Face the blue hall and walk 2 steps Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL

Plan Refinement Use learned lexicon to determine subset of context representing sentence meaning. 43 Face the blue hall and walk 2 steps Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL

Plan Refinement Use learned lexicon to determine subset of context representing sentence meaning. 43 Face the blue hall and walk 2 steps Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL

Plan Refinement Use learned lexicon to determine subset of context representing sentence meaning. 43 Face the blue hall and walk 2 steps Verify Travel Turn Verify LEFT steps: 2 at: SOFA front: SOFA front: BLUE HALL front: BLUE HALL

Evaluation Data Statistics 3 maps, 6 instructors, 1-15 followers/direction Hand-segmented into single sentence steps ParagraphSingle-Sentence # Instructions7063,236 Avg. # sentences5.0 (±2.8)1.0 (±0) Avg. # words37.6 (±21.1)7.8 (±5.1) Avg. # actions10.4 (±5.7)2.1 (±2.4)

End-to-End Execution Evaluation Test how well the system follows novel directions. Leave-one-map-out cross-validation. Strict metric: Only correct if the final position exactly matches goal location. Lower baselines: Simple probabilistic generative model of executed plans w/o language. Semantic parser trained on full context plans Upper baselines: Semantic parser trained on human annotated plans Human followers

End-to-End Execution Accuracy Single-SentenceParagraph Simple Generative Model 11.08%2.15% Trained on Full Context 21.95%2.66% Trained on Refined Plans 57.28%19.18% Trained on Human Annotated Plans 62.67%29.59% Human Followers N/A69.64%

Sample Successful Parse Instruction: “Place your back against the wall of the ‘T’ intersection. Turn left. Go forward along the pink-flowered carpet hall two segments to the intersection with the brick hall. This intersection contains a hatrack. Turn left. Go forward three segments to an intersection with a bare concrete hall, passing a lamp. This is Position 5.” Parse:Turn ( ), Verify ( back: WALL ), Turn ( LEFT ), Travel ( ), Verify ( side: BRICK HALLWAY ), Turn ( LEFT ), Travel ( steps: 3 ), Verify ( side: CONCRETE HALLWAY )

Mandarin Chinese Experiment Translated all the instructions from English to Chinese. 64 Single SentencesParagraphs Trained on Refined Plans 58.70%20.13%

Problem with Purely Correlational Lexicon Learning The correlation between an n-gram w and graph g can be affected by the context. Example: –Bigram: ”the wall” –Sample uses: ”turn so the wall is on your right side” ”with your back to the wall turn left” –Co-occurring aspects of context TURN() VERIFY(direction: WALL) –But “the wall” is simply an object involving no action 40

Syntactic Bootstrapping Children sometimes use syntactic information to guide learning of word meanings (Gleitman, 1990). Complement to Pinker’s semantic bootstrapping in which semantics is used to help learn syntax. 41

Using POS to Aid Lexicon Learning Annotate each n-gram, w, with POS tags. –dead/JJ end/NN Annotate each node in meaning graph, g, with a semantic-category tag. –TURN/Action VERIFY/Action FORWARD/Action 42 Reason: “dead end” is often followed by the action of turning around to face another direction so that there is a way to go forward

Constraints on Lexicon Entry: (w,g) The n-gram w should contain a noun if and only if the graph g contains an Object The n-gram w should contain a verb if and only if the graph g contains an Action 43 dead/JJ end/NN TURN/Action VERIFT/Action FORWARD/Action dead/JJ end/NN Front/Relation WALL/Object

Experimental Results 44

PCFG Induction Model for Grounded Language Learning (Borschinger et al. 2011) PCFG rules to describe generative process from MR components to corresponding NL words

Series of Grounded Language Learning Papers that Build Upon Each Other Kate & Mooney, AAAI-07 Chen & Mooney, ICML-08 Liang, Jordan, and Klein, ACL-09 Kim & Mooney, COLING-10 –Also integrates Lu, Ng, Lee, & Zettlemoyer, EMNLP-08 Borschinger, Jones, & Johnson, EMNLP-11 Kim & Mooney, EMNLP-12 46

PCFG Induction Model for Grounded Language Learning (Borschinger et al. 2011) 47 Generative process –Select complete MR to describe –Generate atomic MR constituents in order –Each atomic MR generates NL words by unigram Markov process Parameters learned using EM (Inside-Outside) Parse new NL sentences by reading top MR nonterminal from most probable parse tree –Output MRs only included in PCFG rule set constructed from training data

Limitations of Borschinger et al PCFG Approach Only works in low ambiguity settings. –Where each sentence can refer to only a few possible MRs. Only output MRs explicitly included in the PCFG constructed from the training data Produces intractably large PCFGs for complex MRs with high ambiguity. –Would require ~10 18 productions for our navigation data. 48

Our Enhanced PCFG Model (Kim & Mooney, EMNLP-2012) Use learned semantic lexicon to constrain the constructed PCFG. Limit each MR to generate only words and phrases paired with this MR in the lexicon. –Only ~18,000 productions produced for the navigation data, compared to ~33,000 produced by Borschinger et al. for far simpler Robocup data. Output novel MRs not appearing in the PCFG by composing subgraphs from the overall context. 49

End-to-End Execution Evaluations 50 Single SentencesParagraphs Mapping to supervised semantic parsing 57.28%19.18% Our PCFG model 57.22%20.17%

51 Conclusions Challenge problem: Learn to follow NL instructions by just watching people follow them. Our goal: Learn without assuming any prior linguistic knowledge. –Easily adapt to new languages Exploit existing work on learning for semantic parsing in order to produce structured meaning representations that can handle complex instructions. Encouraging initial results on learning to navigate in a virtual world, but still far from human-level performance.