1 Learning Natural Language from its Perceptual Context Ray Mooney Department of Computer Science University of Texas at Austin Joint work with David Chen.

Slides:

Advertisements

Similar presentations

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Joohyun.

Adapting Discriminative Reranking to Grounded Language Learning Joohyun Kim and Raymond J. Mooney Department of Computer Science The University of Texas.

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.

Presented by Zeehasham Rasheed

Describing Syntax and Semantics

Introduction to Machine Learning Approach Lecture 5.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

1 Learning to Interpret Natural Language Navigation Instructions from Observation Ray Mooney Department of Computer Science University of Texas at Austin.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Artificial Intelligence Dr. Paul Wagner Department of Computer Science University of Wisconsin – Eau Claire.

Steps Toward an AGI Roadmap Włodek Duch ( Google: W. Duch) AGI, Memphis, 1-2 March 2007 Roadmaps: A Ten Year Roadmap to Machines with Common Sense (Push.

For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.

Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.

1 David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.

David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context.

David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.

Artificial Intelligence Introductory Lecture Jennifer J. Burg Department of Mathematics and Computer Science.

Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.

David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition The 50th Annual Meeting of the Association for Computational Linguistics (ACL)

1 Using Perception to Supervise Language Learning and Language to Supervise Perception Ray Mooney Department of Computer Sciences University of Texas at.

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa

ORIENTEERING. Orienteering, What Is It? Orienteering is a competition to find in in the woods. The person who finds all the in the fastest time, wins.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.

David L. Chen and Raymond J. Mooney Department of Computer Science The University of Texas at Austin Learning to Interpret Natural Language Navigation.

1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.

CSE573 Autumn /23/98 Natural Language Processing Administrative –PS3 due today –PS4 out Wednesday, due Friday 3/13 (last day of class) special.

Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.

Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Chapter 1 –Defining AI Next Tuesday –Intelligent Agents –AIMA, Chapter 2 –HW: Problem.

Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

The Unreasonable Effectiveness of Data

Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.

1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.

Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore

David Chen Supervising Professor: Raymond J. Mooney Doctoral Dissertation Proposal December 15, 2009 Learning Language from Perceptual Context 1.

1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

NATURAL LANGUAGE PROCESSING

1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Rohit.

Grounded Language Learning

Victorian Curriculum Mathematics F - 6 Algorithms unplugged

Neural Machine Translation

Semantic Parsing for Question Answering

Using String-Kernels for Learning Semantic Parsers

Learning to Transform Natural to Formal Languages

Joohyun Kim Supervising Professor: Raymond J. Mooney

Unified Pragmatic Models for Generating and Following Instructions

Learning to Parse Database Queries Using Inductive Logic Programming

Learning to Sportscast: A Test of Grounded Language Acquisition

Using Natural Language Processing to Aid Computer Vision

Presentation transcript:

1 Learning Natural Language from its Perceptual Context Ray Mooney Department of Computer Science University of Texas at Austin Joint work with David Chen Joohyun Kim

Machine Learning and Natural Language Processing (NLP) Manual software development of robust NLP systems was found to be very difficult and time-consuming. Most current state-of-the-art NLP systems are constructed by using machine learning methods trained on large supervised corpora. 2

Syntactic Parsing of Natural Language Produce the correct syntactic parse tree for a sentence. Train and test on Penn Treebank with tens of thousands of manually parsed sentences.

4 Word Sense Disambiguation (WSD) Determine the proper dictionary sense of a word from its sentential context. –Ellen has a strong interest sense1 in computational linguistics. –Ellen pays a large amount of interest sense4 on her credit card. Train and test on Senseval corpora containing hundreds of disambiguated instances of each target word.

5 Semantic Parsing A semantic parser maps a natural-language (NL) sentence to a complete, detailed formal semantic representation: logical form or meaning representation (MR). For many applications, the desired output is computer language that is immediately executable by another program.

Database Query Application Query application for U.S. geography database [Zelle & Mooney, 1996] User How many states does the Mississippi run through? Query answer(A, count(B, (state(B), C=riverid(mississippi), traverse(C,B)), A)) Semantic Parsing DataBase 10

7 CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated soccer players. The coaching instructions are given in a formal language called Clang. Simulated soccer field CLang ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) Semantic Parsing If the ball is in our penalty area, then all our players except player 4 should stay in our half.

8 Learning Semantic Parsers Semantic parsers can be learned automatically from sentences paired with their logical form. NL  MR Training Exs Semantic-Parser Learner Natural Language Meaning Rep Semantic Parser

Limitations of Supervised Learning Constructing supervised training data can be difficult, expensive, and time consuming. For many problems, machine learning has simply replaced the burden of knowledge and software engineering with the burden of supervised data collection. 9

10 Learning Language from Perceptual Context Children do not learn language from annotated corpora. Neither do they learn language from just reading the newspaper, surfing the web, or listening to the radio. –Unsupervised language learning is difficult and not an adequate solution since much of the requisite information is not in the linguistic signal. The natural way to learn language is to perceive language in the context of its use in the physical and social world. This requires inferring the meaning of utterances from their perceptual context.

11 Language Grounding The meanings of many words are grounded in our perception of the physical world: red, ball, cup, run, hit, fall, etc. –Symbol Grounding: Harnad (1990) Even many abstract words and meanings are metaphorical abstractions of terms grounded in the physical world: up, down, over, in, etc. –Lakoff and Johnson’s Metaphors We Live By Its difficult to put my ideas into words. Most NLP work represents meaning without any connection to perception; circularly defining the meanings of words in terms of other words or meaningless symbols with no firm foundation.

Sample Circular Definitions from WordNet sleep (v) –“be asleep” asleep (adj) –“in a state of sleep” 12

13 Initial Challenge Problem: Learn to Be a Sportscaster Goal: Learn from realistic data of natural language used in a representative context while avoiding difficult issues in computer perception (i.e. speech and vision). Solution: Learn from textually annotated traces of activity in a simulated environment. Example: Traces of games in the Robocup simulator paired with textual sportscaster commentary.

14 Grounded Language Learning in Robocup Robocup Simulator Sportscaster Simulated Perception Perceived Facts Score!!!! Grounded Language Learner Language Generator Semantic Parser SCFG Score!!!!

Sample Human Sportscast in Korean 15

16 Robocup Sportscaster Trace Natural Language CommentaryMeaning Representation Purple goalie turns the ball over to Pink8 badPass ( Purple1, Pink8 ) Pink11 looks around for a teammate Pink8 passes the ball to Pink11 Purple team is very sloppy today Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 turnover ( Purple1, Pink8 ) pass ( Pink11, Pink8 ) pass ( Pink8, Pink11 ) ballstopped pass ( Pink8, Pink11 ) kick ( Pink11 ) kick ( Pink8) kick ( Pink11 ) kick ( Pink8 )

17 Robocup Sportscaster Trace Natural Language CommentaryMeaning Representation Purple goalie turns the ball over to Pink8 badPass ( Purple1, Pink8 ) Pink11 looks around for a teammate Pink8 passes the ball to Pink11 Purple team is very sloppy today Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 turnover ( Purple1, Pink8 ) pass ( Pink11, Pink8 ) pass ( Pink8, Pink11 ) ballstopped pass ( Pink8, Pink11 ) kick ( Pink11 ) kick ( Pink8) kick ( Pink11 ) kick ( Pink8 )

18 Robocup Sportscaster Trace Natural Language CommentaryMeaning Representation Purple goalie turns the ball over to Pink8 badPass ( Purple1, Pink8 ) Pink11 looks around for a teammate Pink8 passes the ball to Pink11 Purple team is very sloppy today Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 turnover ( Purple1, Pink8 ) pass ( Pink11, Pink8 ) pass ( Pink8, Pink11 ) ballstopped pass ( Pink8, Pink11 ) kick ( Pink11 ) kick ( Pink8) kick ( Pink11 ) kick ( Pink8 )

19 Robocup Sportscaster Trace Natural Language CommentaryMeaning Representation Purple goalie turns the ball over to Pink8 P6 ( C1, C19 ) Pink11 looks around for a teammate Pink8 passes the ball to Pink11 Purple team is very sloppy today Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 P5 ( C1, C19 ) P2 ( C22, C19 ) P2 ( C19, C22 ) P0 P2 ( C19, C22 ) P1 ( C22 ) P1( C19 ) P1 ( C22 ) P1 ( C19 )

20 Strategic Generation (Content Selection) Generation requires not only knowing how to say something (tactical generation) but also what to say (strategic generation). For automated sportscasting, one must be able to effectively choose which events to describe.

21 Example of Strategic Generation pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 )

22 Example of Strategic Generation pass ( purple7, purple6 ) ballstopped kick ( purple6) pass ( purple6, purple2 ) ballstopped kick ( purple2) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 )

Robocup Data Collected human textual commentary for the 4 Robocup championship games from –Avg # events/game = 2,613 –Avg # English sentences/game = 509 –Avg # Korean sentences/game = 499 Each sentence matched to all events within previous 5 seconds. –Avg # MRs/sentence = 2.5 (min 1, max 12) 23

Algorithm Outline Use EM-like iterative retraining with an existing supervised semantic-parser learner to resolve the ambiguous training data. See journal paper for details: –Chen, Kim, & Mooney (JAIR, 2010) 24 Let each possible NL-MR pair be a (noisy) positive training ex. Until parser converges do: Train supervised parser on current (noisy) training exs. Use current trained parser to pick the best MR for each NL. Create new training exs based on these assignments.

Machine Sportscast in English 25

Experimental Evaluation Evaluated ability of the system to accurately: –Match sentences to their correct meanings –Parse sentences into formal meanings –Generate sentences from formal meanings –Pick which events are worth talking about See journal paper for details: –Chen, Kim, & Mooney (JAIR, 2010)

Used Amazon’s Mechanical Turk to recruit human judges (36 English, 7 Korean judges per video) 8 commented game clips – 4 minute clips randomly selected from each of the 4 games – Each clip commented once by a human, and once by the machine Judges were not told which ones were human or machine generated 27 Human Evaluation of Sportscasts “Pseudo Turing Test”

Human Evaluation Metrics Score English Fluency Semantic Correctness Sportscasting Ability 5FlawlessAlwaysExcellent 4GoodUsuallyGood 3Non-nativeSometimesAverage 2DisfluentRarelyBad 1GibberishNeverTerrible 28 Human? Also asked human judge to predict if a human or machine generated the sportscast, knowing there was some of each in the data.

Pseudo-Turing-Test Results 29 CommentatorFluencySemantic Correctness Sportscasting Ability Human? Human % Machine % English Korean CommentatorFluencySemantic Correctness Sportscasting Ability Human? Human % Machine %

30 Challenge Problem #2: Learning to Follow Directions in a Virtual World Learn to interpret navigation instructions in a virtual environment by simply observing humans giving and following such directions (Chen & Mooney, AAAI-11). Eventual goal: Virtual agents in video games and educational software that automatically learn to take and give instructions in natural language.

H C L S S B C H E L E Sample Environment (MacMahon, et al. AAAI-06) H – Hat Rack L – Lamp E – Easel S – Sofa B – Barstool C - Chair 31

Sample Instructions Take your first left. Go all the way down until you hit a dead end. Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4. Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4. Walk forward once. Turn left. Walk forward twice. Start 3 3 H H End

Sample Instructions 3 3 H H 4 4 Take your first left. Go all the way down until you hit a dead end. Go towards the coat hanger and turn left at it. Go straight down the hallway and the dead end is position 4. Walk to the hat rack. Turn left. The carpet should have green octagons. Go to the end of this alley. This is p-4. Walk forward once. Turn left. Walk forward twice. Observed primitive actions: Forward, Left, Forward, Forward 33 Start End

Instruction Following Demo Navigation Demo Applet

Formal Problem Definition Given: { (e 1, a 1, w 1 ), (e 2, a 2, w 2 ), …, (e n, a n, w n ) } e i – A natural language instruction a i – An observed action sequence w i – A world state Goal: Build a system that produces the correct a j given a previously unseen (e j, w j ).

Observation Instruction World State Training Action Trace

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Training Action Trace Navigation Plan Constructor

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Training Action Trace Navigation Plan Constructor Semantic Parser Learner

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Training Action Trace Navigation Plan Constructor Semantic Parser Learner Plan Refinement

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Instruction World State Training Testing Action Trace Navigation Plan Constructor Semantic Parser Learner Plan Refinement

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Instruction World State Training Testing Action Trace Navigation Plan Constructor Semantic Parser Learner Plan Refinement Semantic Parser

Learning system for parsing navigation instructions Learning system for parsing navigation instructions Observation Instruction World State Execution Module (MARCO) Instruction World State Training Testing Action Trace Navigation Plan Constructor Semantic Parser Learner Plan Refinement Semantic Parser Action Trace

Evaluation Data Statistics 3 maps, 6 instructors, 1-15 followers/direction Hand-segmented into single sentence steps ParagraphSingle-Sentence # Instructions Avg. # sentences5.0 (±2.8)1.0 (±0) Avg. # words37.6 (±21.1)7.8 (±5.1) Avg. # actions10.4 (±5.7)2.1 (±2.4)

End-to-End Execution Evaluation Test how well the system follows novel directions. Leave-one-map-out cross-validation. Strict metric: Only correct if the final position exactly matches goal location. Lower baseline: Simple probabilistic generative model of executed plans w/o language. Upper baselines: Semantic parser trained on human annotated plans Human followers

End-to-End Execution Accuracy Single-SentenceComplete Simple Generative Model Landmarks Plans Refined Landmarks Plans Human Annotated Plans Human Followers N/A69.64

Sample Successful Parse Instruction: “Place your back against the wall of the ‘T’ intersection. Turn left. Go forward along the pink-flowered carpet hall two segments to the intersection with the brick hall. This intersection contains a hatrack. Turn left. Go forward three segments to an intersection with a bare concrete hall, passing a lamp. This is Position 5.” Parse:Turn ( ), Verify ( back: WALL ), Turn ( LEFT ), Travel ( ), Verify ( side: BRICK HALLWAY ), Turn ( LEFT ), Travel ( steps: 3 ), Verify ( side: CONCRETE HALLWAY )

Future Challenge Area: Learning for Language and Vision Natural Language Processing (NLP) and Computer Vision (CV) are both very challenging problems. Machine Learning (ML) is now extensively used to automate the construction of both effective NLP and CV systems. Generally uses supervised ML and requires difficult and expensive human annotation of large text or image/video corpora for training.

Cross-Supervision of Language and Vision Use naturally co-occurring perceptual input to supervise language learning. Use naturally co-occurring linguistic input to supervise visual learning. Blue cylinder on top of a red cube. Language Learner Input Supervision Vision Learner Input Supervision

49 Conclusions Current language-learning approaches uses expensive, unrealistic training data. We have developed language-learning systems that learn from sentences paired with an ambiguous, naturally-occurring perceptual environment. We have explored 2 challenge problems: –Learning to sportscast simulated Robocup games Able to commentate games about as well as humans. –Learning to follow navigation directions Able to accurately follow 55% of instructional sentences for a novel environment.