Learning to Sportscast: A Test of Grounded Language Acquisition

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Using Closed Captions to Train Activity Recognizers that Improve Video Retrieval Sonal Gupta and Raymond Mooney University of Texas at Austin.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Joohyun.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Presented by Zeehasham Rasheed
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
1 Learning to Interpret Natural Language Navigation Instructions from Observation Ray Mooney Department of Computer Science University of Texas at Austin.
1 Learning Natural Language from its Perceptual Context Ray Mooney Department of Computer Science University of Texas at Austin Joint work with David Chen.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
1 David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.
David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context.
David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition The 50th Annual Meeting of the Association for Computational Linguistics (ACL)
An Extended GHKM Algorithm for Inducing λ-SCFG Peng Li Tsinghua University.
1 Using Perception to Supervise Language Learning and Language to Supervise Perception Ray Mooney Department of Computer Sciences University of Texas at.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
David L. Chen and Raymond J. Mooney Department of Computer Science The University of Texas at Austin Learning to Interpret Natural Language Navigation.
1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.
Beyond Nouns Exploiting Preposition and Comparative adjectives for learning visual classifiers.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
David Chen Supervising Professor: Raymond J. Mooney Doctoral Dissertation Proposal December 15, 2009 Learning Language from Perceptual Context 1.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Rohit.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Language Identification and Part-of-Speech Tagging
التوجيه الفني العام للغة الإنجليزية
Automatically Labeled Data Generation for Large Scale Event Extraction
Statistical Machine Translation Part II: Word Alignments and EM
Compiler Design (40-414) Main Text Book:
Semantic Parsing for Question Answering
Language skills Four skills – L,S,R,W Receptive skills
Listening Speaking Reading Class Preparation Class Preparation Class Preparation Class Preparation Online Tools Online Tools Online Tools Online Tools.
Statistical NLP: Lecture 13
Spoken Dialog System.
Using String-Kernels for Learning Semantic Parsers
Learning to Transform Natural to Formal Languages
Joohyun Kim Supervising Professor: Raymond J. Mooney
--Mengxue Zhang, Qingyang Li
Integrating Learning of Dialog Strategies and Semantic Parsing
Distributed Representation of Words, Sentences and Paragraphs
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
CSc4730/6730 Scientific Visualization
CS 388: Natural Language Processing: Semantic Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Learning a Policy for Opportunistic Active Learning
Using Natural Language Processing to Aid Computer Vision
Machine learning overview
Chapter 10: Compilers and Language Translation
Jointly Generating Captions to Aid Visual Question Answering
Faithful Multimodal Explanation for VQA
Presentation transcript:

Learning to Sportscast: A Test of Grounded Language Acquisition David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin

Motivation Constructing annotated corpora for language learning is difficult Children acquire language through exposure to linguistic input in the context of a rich, relevant, perceptual environment

Goals Learn to ground the semantics of language Block Learn language through correlated linguistic and visual inputs

Challenge

Challenge

Challenge A linguistic input may correspond to many possible events ? Block ? ?

Overview Sportscasting task Tactical generation Strategic generation Human evaluation

Learning to Sportscast Robocup Simulation League games No speech recognition Record commentaries in text form No computer vision Ruled-based system to automatically extract game events in symbolic form Concentrate on linguistic issues

Robocup Simulation League

Robocup Simulation League Pink4’s pass was intercepted by Purple6

Learning to Sportscast Learn to sportscast by observing sample human sportscasts Build a function that maps between natural language (NL) and meaning representation (MR) NL: Textual commentaries about the game MR: Predicate logic formulas that represent events in the game

Mapping between NL/MR NL: “Purple3 passes the ball to Purple5” Tactical Generation (MR  NL) Semantic Parsing (NL  MR) MR: Pass ( Purple3, Purple5 )

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation badPass ( Purple1, Pink8 ) turnover ( Purple1, Pink8 ) Purple goalie turns the ball over to Pink8 kick ( Pink8) pass ( Pink8, Pink11 ) Purple team is very sloppy today kick ( Pink11 ) Pink8 passes the ball to Pink11 Pink11 looks around for a teammate kick ( Pink11 ) ballstopped kick ( Pink11 ) Pink11 makes a long pass to Pink8 pass ( Pink11, Pink8 ) kick ( Pink8 ) pass ( Pink8, Pink11 ) Pink8 passes back to Pink11 13

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation badPass ( Purple1, Pink8 ) turnover ( Purple1, Pink8 ) Purple goalie turns the ball over to Pink8 kick ( Pink8) pass ( Pink8, Pink11 ) Purple team is very sloppy today kick ( Pink11 ) Pink8 passes the ball to Pink11 Pink11 looks around for a teammate kick ( Pink11 ) ballstopped kick ( Pink11 ) Pink11 makes a long pass to Pink8 pass ( Pink11, Pink8 ) kick ( Pink8 ) pass ( Pink8, Pink11 ) Pink8 passes back to Pink11 14

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation badPass ( Purple1, Pink8 ) turnover ( Purple1, Pink8 ) Purple goalie turns the ball over to Pink8 kick ( Pink8) pass ( Pink8, Pink11 ) Purple team is very sloppy today kick ( Pink11 ) Pink8 passes the ball to Pink11 Pink11 looks around for a teammate kick ( Pink11 ) ballstopped kick ( Pink11 ) Pink11 makes a long pass to Pink8 pass ( Pink11, Pink8 ) kick ( Pink8 ) pass ( Pink8, Pink11 ) Pink8 passes back to Pink11 15

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation P6 ( C1, C19 ) P5 ( C1, C19 ) Purple goalie turns the ball over to Pink8 P1( C19 ) P2 ( C19, C22 ) Purple team is very sloppy today P1 ( C22 ) Pink8 passes the ball to Pink11 Pink11 looks around for a teammate P1 ( C22 ) P0 P1 ( C22 ) Pink11 makes a long pass to Pink8 P2 ( C22, C19 ) P1 ( C19 ) P2 ( C19, C22 ) Pink8 passes back to Pink11 16

Robocup Data Collected human textual commentary for the 4 Robocup championship games from 2001-2004. Avg # events/game = 2,613 Avg # sentences/game = 509 Each sentence matched to all events within previous 5 seconds. Avg # MRs/sentence = 2.5 (min 1, max 12) Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only).

Overview Sportscasting task Tactical generation Strategic generation Human evaluation

Tactical Generation Learn how to generate NL from MR Example: Two steps Disambiguate the training data Learn a language generator Pass(Pink2, Pink3)  “Pink2 kicks the ball to Pink3”

System Overview Ambiguous Training Data Sportscaster Robocup Simulator Pass ( Purple5, Purple7 ) Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Kick ( pink2 ) Pass ( pink2 , pink5 ) Pink2 kicks the ball to Pink5 Kick ( pink5 ) Pass ( pink5 , pink8) Pink5 makes a long pass to Pink8 Ballstopped Kick ( pink8 ) Pink8 shoots the ball Ambiguous Training Data

Semantic Parser Learner System Overview Sportscaster Robocup Simulator Pass ( Purple5, Purple7 ) Initial Semantic Parser Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Kick ( pink2 ) Pass ( pink2 , pink5 ) Pink2 kicks the ball to Pink5 Kick ( pink5 ) Pass ( pink5 , pink8) Pink5 makes a long pass to Pink8 Ballstopped Kick ( pink8 ) Pink8 shoots the ball Semantic Parser Learner Ambiguous Training Data

System Overview Unambiguous Training Data Initial Semantic Parser Purple7 loses the ball to Pink2 Kick ( pink2 ) Sportscaster Robocup Simulator Pink2 kicks the ball to Pink5 Pass ( pink2 , pink5 ) Pink5 makes a long pass to Pink8 Kick ( pink5 ) Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Pass ( purple5, purple7 ) Initial Semantic Parser Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Kick ( pink2 ) Pass ( pink2 , pink5 ) Pink2 kicks the ball to Pink5 Kick ( pink5 ) Pass ( pink5 , pink8) Pink5 makes a long pass to Pink8 Ballstopped Kick ( pink8 ) Pink8 shoots the ball Ambiguous Training Data

Semantic Parser Learner System Overview Purple7 loses the ball to Pink2 Kick ( pink2 ) Sportscaster Robocup Simulator Pink2 kicks the ball to Pink5 Pass ( pink2 , pink5 ) Pink5 makes a long pass to Pink8 Kick ( pink5 ) Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Pass ( purple5, purple7 ) Semantic Parser Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Kick ( pink2 ) Pass ( pink2 , pink5 ) Pink2 kicks the ball to Pink5 Kick ( pink5 ) Pass ( pink5 , pink8) Pink5 makes a long pass to Pink8 Ballstopped Kick ( pink8 ) Pink8 shoots the ball Semantic Parser Learner Ambiguous Training Data

Semantic Parser Learner System Overview Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Sportscaster Robocup Simulator Pink2 kicks the ball to Pink5 Pass ( pink2 , pink5 ) Pink5 makes a long pass to Pink8 Kick ( pink5 ) Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Pass ( purple5, purple7 ) Semantic Parser Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Kick ( pink2 ) Pass ( pink2 , pink5 ) Pink2 kicks the ball to Pink5 Kick ( pink5 ) Pass ( pink5 , pink8) Pink5 makes a long pass to Pink8 Ballstopped Kick ( pink8 ) Pink8 shoots the ball Semantic Parser Learner Ambiguous Training Data

Semantic Parser Learner System Overview Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Sportscaster Robocup Simulator Pink2 kicks the ball to Pink5 Pass ( pink2 , pink5 ) Pink5 makes a long pass to Pink8 Kick ( pink5 ) Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Pass ( purple5, purple7 ) Semantic Parser Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Kick ( pink2 ) Pass ( pink2 , pink5 ) Pink2 kicks the ball to Pink5 Kick ( pink5 ) Pass ( pink5 , pink8) Pink5 makes a long pass to Pink8 Ballstopped Kick ( pink8 ) Pink8 shoots the ball Semantic Parser Learner Ambiguous Training Data

Semantic Parser Learner System Overview Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Sportscaster Robocup Simulator Pink2 kicks the ball to Pink5 Pass ( pink2 , pink5 ) Pink5 makes a long pass to Pink8 Pass ( pink5 , pink8) Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Pass ( purple5, purple7 ) Semantic Parser Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Kick ( pink2 ) Pass ( pink2 , pink5 ) Pink2 kicks the ball to Pink5 Kick ( pink5 ) Pass ( pink5 , pink8) Pink5 makes a long pass to Pink8 Ballstopped Kick ( pink8 ) Pink8 shoots the ball Semantic Parser Learner Ambiguous Training Data

Semantic Parser Learners Learn a function from NL to MR NL: “Purple3 passes the ball to Purple5” Semantic Parsing (NL  MR) Tactical Generation (MR  NL) MR: Pass ( Purple3, Purple5 ) We experiment with two semantic parser learners WASP (Wong & Mooney, 2006; 2007) KRISP (Kate & Mooney, 2006)

WASP: Word Alignment-based Semantic Parsing Uses statistical machine translation techniques Synchronous context-free grammars (SCFG) (Wu, 1997; Melamed, 2004; Chiang, 2005) Word alignments (Brown et al., 1993; Och & Ney, 2003) Capable of both semantic parsing and tactical generation

KRISP: Kernel-based Robust Interpretation by Semantic Parsing Productions of MR language are treated like semantic concepts SVM classifier is trained for each production with string subsequence kernel These classifiers are used to compositionally build MRs of the sentences More resistant to noisy supervision but incapable of tactical generation

Matching Ability to find correct NL/MR pair 4 Robocup championship games from 2001-2004. Avg # events/game = 2,613 Avg # sentences/game = 509 Leave-one-game-out cross-validation Metric: Precision: % of system’s annotations that are correct Recall: % of gold-standard annotations correctly produced F-measure: Harmonic mean of precision and recall

WASP’s language generator Systems Learner KRISPER (Kate & Mooney, 2007) KRISP WASPER WASP WASPER-GEN WASP’s language generator

Semantic Parser Learner KRISPER and WASPER Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Sportscaster Robocup Simulator Pink2 kicks the ball to Pink5 Pass ( pink2 , pink5 ) Pink5 makes a long pass to Pink8 Kick ( pink5 ) Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Pass ( purple5, purple7 ) Semantic Parser Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Kick ( pink2 ) Pass ( pink2 , pink5 ) Pink2 kicks the ball to Pink5 Kick ( pink5 ) Pass ( pink5 , pink8) Pink5 makes a long pass to Pink8 Ballstopped Kick ( pink8 ) Pink8 shoots the ball Semantic Parser Learner (KRISP/WASP) Ambiguous Training Data

WASP’s language generator Systems Learner KRISPER (Kate & Mooney, 2007) KRISP WASPER WASP WASPER-GEN WASP’s language generator

Tactical Generator Learner WASPER-GEN Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Sportscaster Robocup Simulator Pink2 kicks the ball to Pink5 Pass ( pink2 , pink5 ) Pink5 makes a long pass to Pink8 Kick ( pink5 ) Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Pass ( purple5, purple7 ) Tactical Generator Turnover ( purple7 , pink2 ) Purple7 loses the ball to Pink2 Kick ( pink2 ) Pass ( pink2 , pink5 ) Pink2 kicks the ball to Pink5 Kick ( pink5 ) Pass ( pink5 , pink8) Pink5 makes a long pass to Pink8 Ballstopped Kick ( pink8 ) Pink8 shoots the ball Tactical Generator Learner (WASP) Ambiguous Training Data

Matching Results

Overview Sportscasting task Tactical generation Strategic generation Human evaluation

Strategic Generation Generation requires not only knowing how to say something (tactical generation) but also what to say (strategic generation). For automated sportscasting, one must be able to effectively choose which events to describe.

Example of Strategic Generation pass ( purple7 , purple6 ) ballstopped kick ( purple6 ) pass ( purple6 , purple2 ) kick ( purple2 ) pass ( purple2 , purple3 ) kick ( purple3 ) badPass ( purple3 , pink9 ) turnover ( purple3 , pink9 )

Example of Strategic Generation pass ( purple7 , purple6 ) ballstopped kick ( purple6 ) pass ( purple6 , purple2 ) kick ( purple2 ) pass ( purple2 , purple3 ) kick ( purple3 ) badPass ( purple3 , pink9 ) turnover ( purple3 , pink9 )

Strategic Generation For each event type (e.g. pass, kick) estimate the probability that it is described by the sportscaster. Requires correct NL/MR matching Use estimated matching from tactical generation Iterative Generation Strategy Learning

Iterative Generation Strategy Learning (IGSL) Directly estimates the likelihood of an event being commented on Self-training iterations to improve estimates Uses events not associated with any NL as negative evidence

Strategic Generation Performance Evaluate how well the system can predict which events a human comments on Metric: Precision: % of system’s annotations that are correct Recall: % of gold-standard annotations correctly produced F-measure: Harmonic mean of precision and recall

Strategic Generation Results

Overview Sportscasting task Tactical generation Strategic generation Human evaluation

Human Evaluation (Quasi Turing Test) 4 fluent English speakers as judges 8 commented game clips 2 minute clips randomly selected from each of the 4 games Each clip commented once by a human, and once by the machine Presented in random counter-balanced order Judges were not told which ones were human or machine generated

Demo Clip Game clip commentated using WASPER-GEN with IGSL, since this gave the best results for generation. FreeTTS was used to synthesize speech from textual output.

Human Evaluation Score English Fluency Semantic Correctness Sportscasting Ability 5 Flawless Always Excellent 4 Good Usually 3 Non-native Sometimes Average 2 Disfluent Rarely Bad 1 Gibberish Never Terrible 47

Human Evaluation Commentator English Fluency Semantic Correctness Score English Fluency Semantic Correctness Sportscasting Ability 5 Flawless Always Excellent 4 Good Usually 3 Non-native Sometimes Average 2 Disfluent Rarely Bad 1 Gibberish Never Terrible Commentator English Fluency Semantic Correctness Sportscasting Ability Human 3.94 4.25 3.63 Machine 3.44 3.56 2.94 Difference 0.5 0.69 48

Future Work Expand MRs to beyond simple logic formulas Apply approach to learning situated language in a computer video-game environment (Gorniak & Roy, 2005) Apply approach to captioned images or video using computer vision to extract objects, relations, and events from real perceptual data (Fleischman & Roy, 2007)

Conclusion Current language learning work uses expensive, unrealistic training data. We have developed a language learning system that can learn from language paired with an ambiguous perceptual environment. We have evaluated it on the task of learning to sportscast simulated Robocup games. The system learns to sportscast almost as well as humans.