1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Rohit.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Joohyun.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
111 CS 388: Natural Language Processing: Semantic Parsing Raymond J. Mooney University of Texas at Austin.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
1 Learning to Interpret Natural Language Navigation Instructions from Observation Ray Mooney Department of Computer Science University of Texas at Austin.
1 Learning Natural Language from its Perceptual Context Ray Mooney Department of Computer Science University of Texas at Austin Joint work with David Chen.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Language Semantics from Ambiguous Supervision Rohit J. Kate.
1 David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.
David L. Chen Supervisor: Professor Raymond J. Mooney Ph.D. Dissertation Defense January 25, 2012 Learning Language from Ambiguous Perceptual Context.
David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition The 50th Annual Meeting of the Association for Computational Linguistics (ACL)
1 Using Perception to Supervise Language Learning and Language to Supervise Perception Ray Mooney Department of Computer Sciences University of Texas at.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
David L. Chen and Raymond J. Mooney Department of Computer Science The University of Texas at Austin Learning to Interpret Natural Language Navigation.
1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing Raymond.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
Chapter 23: Probabilistic Language Models April 13, 2004.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Reconnecting Computational Linguistics.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing with Kernels under Various Forms of.
David Chen Supervising Professor: Raymond J. Mooney Doctoral Dissertation Proposal December 15, 2009 Learning Language from Perceptual Context 1.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning for Semantic Parsing of Natural.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Natural Language Processing Vasile Rus
A Kernel-based Approach to Learning Semantic Parsers
Semantic Parsing for Question Answering
CSC 594 Topics in AI – Natural Language Processing
Using String-Kernels for Learning Semantic Parsers
Learning to Transform Natural to Formal Languages
Joohyun Kim Supervising Professor: Raymond J. Mooney
CS 388: Natural Language Processing: Semantic Parsing
Learning to Parse Database Queries Using Inductive Logic Programming
Learning to Sportscast: A Test of Grounded Language Acquisition
Presentation transcript:

1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Rohit Kate Yuk Wah Wong

Current State of Natural Language Learning Most current state-of-the-art NLP systems are constructed by training on large supervised corpora. –Syntactic Parsing: Penn Treebank –Word Sense Disambiguation: SenseEval –Semantic Role Labeling: Propbank –Machine Translation: Hansards corpus Constructing such annotated corpora is difficult, expensive, and time consuming. 2

3 Semantic Parsing A semantic parser maps a natural-language sentence to a complete, detailed semantic representation: logical form or meaning representation (MR). For many applications, the desired output is immediately executable by another program. Two application domains: –GeoQuery: A Database Query Application –CLang: RoboCup Coach Language

4 GeoQuery: A Database Query Application Query application for U.S. geography database [Zelle & Mooney, 1996] User How many states does the Mississippi run through? Query answer(A, count(B, (state(B), C=riverid(mississippi), traverse(C,B)), A)) Semantic Parsing DataBase 10

5 CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated soccer players The coaching instructions are given in a formal language called CLang Simulated soccer field Coach If the ball is in our penalty area, then all our players except player 4 should stay in our half. CLang ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) Semantic Parsing

6 Learning Semantic Parsers Manually programming robust semantic parsers is difficult due to the complexity of the task. Semantic parsers can be learned automatically from sentences paired with their logical form. NL  MR Training Exs Semantic-Parser Learner Natural Language Meaning Rep Semantic Parser

7 Our Semantic-Parser Learners CHILL+WOLFIE (Zelle & Mooney, 1996; Thompson & Mooney, 1999, 2003) –Separates parser-learning and semantic-lexicon learning. –Learns a deterministic parser using ILP techniques. COCKTAIL (Tang & Mooney, 2001) –Improved ILP algorithm for CHILL. SILT (Kate, Wong & Mooney, 2005) –Learns symbolic transformation rules for mapping directly from NL to MR. SCISSOR (Ge & Mooney, 2005) –Integrates semantic interpretation into Collins’ statistical syntactic parser. WASP (Wong & Mooney, 2006; 2007) –Uses syntax-based statistical machine translation methods. KRISP (Kate & Mooney, 2006) –Uses a series of SVM classifiers employing a string-kernel to iteratively build semantic representations.

8 WASP A Machine Translation Approach to Semantic Parsing Uses statistical machine translation techniques –Synchronous context-free grammars (SCFG) (Wu, 1997; Melamed, 2004; Chiang, 2005) –Word alignments (Brown et al., 1993; Och & Ney, 2003) Hence the name: Word Alignment-based Semantic Parsing

9 A Unifying Framework for Parsing and Generation Natural Languages Machine translation

10 A Unifying Framework for Parsing and Generation Natural Languages Formal Languages Semantic parsing Machine translation

11 A Unifying Framework for Parsing and Generation Natural Languages Formal Languages Semantic parsing Tactical generation Machine translation

12 A Unifying Framework for Parsing and Generation Natural Languages Formal Languages Semantic parsing Tactical generation Machine translation Synchronous Parsing

13 A Unifying Framework for Parsing and Generation Natural Languages Formal Languages Semantic parsing Tactical generation Machine translation Compiling: Aho & Ullman (1972) Synchronous Parsing

14 Synchronous Context-Free Grammars (SCFG) Developed by Aho & Ullman (1972) as a theory of compilers that combines syntax analysis and code generation in a single phase. Generates a pair of strings in a single derivation.

15 QUERY  What is CITY / answer(CITY) Synchronous Context-Free Grammar Production Rule Natural languageFormal language

16 STATE  Ohio / stateid('ohio') QUERY  What is CITY / answer(CITY) CITY  the capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) What is the capital of Ohio Synchronous Context-Free Grammar Derivation Ohio of STATE QUERY CITY What is QUERY answer ( CITY ) capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) answer(capital(loc_2(stateid('ohio')))) CITY the capital

17 Probabilistic Parsing Model Ohio of STATE CITY capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) capital CITY STATE  Ohio / stateid('ohio') CITY  capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) d1d1

18 Probabilistic Parsing Model Ohio of RIVER CITY capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) capital CITY RIVER  Ohio / riverid('ohio') CITY  capital CITY / capital(CITY) CITY  of RIVER / loc_2(RIVER) d2d2

19 CITY capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) Probabilistic Parsing Model CITY capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) STATE  Ohio / stateid('ohio') CITY  capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE) RIVER  Ohio / riverid('ohio') CITY  capital CITY / capital(CITY) CITY  of RIVER / loc_2(RIVER) λλ Pr(d 1 |capital of Ohio) = exp( ) / ZPr(d 2 |capital of Ohio) = exp( ) / Z d1d1 d2d2 normalization constant

20 Overview of WASP Lexical acquisition Parameter estimation Semantic parsing Unambiguous CFG of MRL Training set, {(e,f)} Lexicon, L (an SCFG) SCFG parameterized by λ Input sentence, e' Output MR, f' Training Testing

21 Tactical Generation Can be seen as inverse of semantic parsing ((true) (do our {1} (pos (half our)))) The goalie should always stay in our half Semantic parsing Tactical generation

22 Tactical generation: Generation by Inverting WASP Same synchronous grammar is used for both generation and semantic parsing. QUERY  What is CITY / answer(CITY) NL:MRL: InputOutput Semantic parsing:

23 Learning Language from Perceptual Context Children do not learn language from annotated corpora. Neither do they learn language from just reading the newspaper, surfing the web, or listening to the radio. The natural way to learn language is to perceive language in the context of its use in the physical and social world. This requires inferring the meaning of utterances from their perceptual context.

24 Language Grounding The meanings of many words are grounded in our perception of the physical world: red, ball, cup, run, hit, fall, etc. –Symbol Grounding: Harnad (1990) Even many abstract words and meanings are metaphorical abstractions of terms grounded in the physical world: up, down, over, in, etc. –Lakoff and Johnson’s Metaphors We Live By Its difficult to put my words into ideas. Interest in competitions is up. Most work in NLP tries to represent meaning without any connection to perception or to the physical world; circularly defining the meanings of words in terms of other words or meaningless symbols with no firm foundation.

25 ??? “Mary is on the phone”

Ambiguous Supervision for Learning Semantic Parsers A computer system simultaneously exposed to perceptual contexts and natural language utterances should be able to learn the underlying language semantics. We consider ambiguous training data of sentences associated with multiple potential MRs. –Siskind (1996) uses this type “referentially uncertain” training data to learn meanings of words. Extracting meaning representations from perceptual data is a difficult unsolved problem. –Our system directly works with symbolic MRs.

27 “Mary is on the phone” ???

28 “Mary is on the phone” ???

29 Ironing(Mommy, Shirt) “Mary is on the phone” ???

30 Ironing(Mommy, Shirt) Working(Sister, Computer) “Mary is on the phone” ???

31 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) “Mary is on the phone” ???

32 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) Talking(Mary, Phone) Sitting(Mary, Chair) “Mary is on the phone” Ambiguous Training Example ???

33 Ironing(Mommy, Shirt) Working(Sister, Computer) Talking(Mary, Phone) Sitting(Mary, Chair) “Mommy is ironing a shirt” Next Ambiguous Training Example ???

Ambiguous Supervision for Learning Semantic Parsers contd. Our model of ambiguous supervision corresponds to the type of data that will be gathered from a temporal sequence of perceptual contexts with occasional language commentary. We assume each sentence has exactly one meaning in its perceptual context. –Recently extended to handle sentences with no meaning in its perceptual context. Each meaning is associated with at most one sentence.

35 Sample Ambiguous Corpus Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) Forms a bipartite graph

KRISPER: KRISP with EM-like Retraining Extension of K RISP that learns from ambiguous supervision. Uses an iterative EM-like method to gradually converge on a correct meaning for each sentence.

37 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 1. Assume every possible meaning for a sentence is correct

38 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 1. Assume every possible meaning for a sentence is correct

39 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 2. Resulting NL-MR pairs are weighted and given to K RISP 1/2 1/4 1/5 1/3

40 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 3. Estimate the confidence of each NL-MR pair using the resulting trained parser

41 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]

42 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]

43 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 5. Give the best pairs to K RISP in the next iteration, and repeat until convergence

Results on Ambig-ChildWorld Corpus

45 New Challenge: Learning to Be a Sportscaster Goal: Learn from realistic data of natural language used in a representative context while avoiding difficult issues in computer perception (i.e. speech and vision). Solution: Learn from textually annotated traces of activity in a simulated environment. Example: Traces of games in the Robocup simulator paired with textual sportscaster commentary.

46 Grounded Language Learning in Robocup Robocup Simulator Sportscaster Simulated Perception Perceived Facts Score!!!! Grounded Language Learner Language Generator Semantic Parser SCFG Score!!!!

47 Robocup Sportscaster Trace purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) Natural Language CommentaryMeaning Representation

48 Robocup Sportscaster Trace purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) Natural Language CommentaryMeaning Representation

49 Robocup Sportscaster Trace purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) Natural Language CommentaryMeaning Representation

Sportscasting Data Collected human textual commentary for the 4 Robocup championship games from –Avg # events/game = 2,613 –Avg # sentences/game = 509 Each sentence matched to all events within previous 5 seconds. –Avg # MRs/sentence = 2.5 (min 1, max 12) Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only). 50

WASPER WASP with EM-like retraining to handle ambiguous training data. Same augmentation as added to KRISP to create KRISPER. 51

KRISPER-WASP First iteration of EM-like training produces very noisy training data (> 50% errors). KRISP is better than WASP at handling noisy training data. –SVM prevents overfitting. –String kernel allows partial matching. But KRISP does not support language generation. First train KRISPER just to determine the best NL→MR matchings. Then train WASP on the resulting unambiguously supervised data. 52

WASPER-GEN In KRISPER and WASPER, the correct MR for each sentence is chosen based on maximizing the confidence of semantic parsing (NL→MR). Instead, WASPER-GEN determines the best matching based on generation (MR→NL). Score each potential NL/MR pair by using the currently trained WASP -1 generator. Compute NIST MT score (alternative to BLEU score) between the generated sentence and the potential matching sentence. 53

Strategic Generation Generation requires not only knowing how to say something (tactical generation) but also what to say (strategic generation). For automated sportscasting, one must be able to effectively choose which events to describe. 54

pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) Example of Strategic Generation 55

Example of Strategic Generation 56 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 )

Learning for Strategic Generation For each event type (e.g. pass, kick) estimate the probability that it is described by the sportscaster. Requires NL/MR matching that indicates which events were described, but this is not provided in the ambiguous training data. –Use estimated matching computed by KRISPER, WASPER or WASPER-GEN. –Use a version of EM to determine the probability of mentioning each event type just based on strategic info. 57

EM for Strategic Generation 58 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) 1 1/4 1/5

EM for Strategic Generation 59 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) 1 1/4 1/5 P(pass)=(1+1/4+1/4+1/4+1/5)/3=0.65 Estimate Generation Probs

EM for Strategic Generation 60 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) 1 1/4 1/5 P(pass)=0.65P(ballstopped)=(1/4+1/4)/2=0.25 Estimate Generation Probs

EM for Strategic Generation 61 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) 1 1/4 1/5 P(pass)=0.65P(ballstopped)=0.25P(kick)=(1/4+1/4+1/5+1/5)/3=0.3 Estimate Generation Probs

EM for Strategic Generation 62 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) 1 1/4 1/5 P(pass)=0.65P(ballstopped)=0.25P(kick)=0.3P(badpass)=0.2 P(turnover)=0.2 Estimate Generation Probs

EM for Strategic Generation 63 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) P(pass)=0.65P(ballstopped)=0.25P(kick)=0.3P(badpass)=0.2 P(turnover)=0.2 Reassign link weights

EM for Strategic Generation 64 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) P(pass)=0.65P(ballstopped)=0.25P(kick)=0.3P(badpass)=0.2 P(turnover)=0.2 Normalize link weights

EM for Strategic Generation purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) Recalculate Generation Probs and Repeat Until Convergence

Demo Game clip commentated using WASPER- GEN with EM-based strategic generation, since this gave the best results for generation. FreeTTS was used to synthesize speech from textual output.

Experimental Evaluation Generated learning curves by training on all combinations of 1 to 3 games and testing on all games not used for training. Baselines: –Random Matching: WASP trained on random choice of possible MR for each comment. –Gold Matching: WASP trained on correct matching of MR for each comment. Metrics: –Precision: % of system’s annotations that are correct –Recall: % of gold-standard annotations correctly produced –F-measure: Harmonic mean of precision and recall

Evaluating Matching Accuracy Measure how accurately various methods assign MRs to sentences in the ambiguous training data. Use gold-standard matches to evaluate correctness.

Results on Matching

Evaluating Semantic Parsing Measure how accurately learned parser maps sentences to their correct meanings in the test games. Use the gold-standard matches to determine the correct MR for each sentence that has one. Generated MR must exactly match gold- standard to count as correct.

Results on Semantic Parsing

Evaluating Tactical Generation Measure how accurately NL generator produces English sentences for chosen MRs in the test games. Use gold-standard matches to determine the correct sentence for each MR that has one. Use NIST score to compare generated sentence to the one in the gold-standard.

Results on Tactical Generation

Evaluating Strategic Generation In the test games, measure how accurately the system determines which perceived events to comment on. Compare the subset of events chosen by the system to the subset chosen by the human annotator (as given by the gold-standard matching).

Results on Strategic Generation

Human Evaluation (Quasi Turing Test) Asked 4 fluent English speakers to evaluate overall quality of sportscasts. Randomly picked a 2 minute segment from each of the 4 games. Each human judge evaluated 8 commented game clips, each of the 4 segments commented once by a human and once by the machine when tested on that game. The 8 clips presented to each judge were shown in random counter-balanced order. Judges were not told which ones were human or machine generated. 76

Human Evaluation Metrics Score English Fluency Semantic Correctness Sportscasting Ability 5FlawlessAlwaysExcellent 4GoodUsuallyGood 3Non-nativeSometimesAverage 2DisfluentRarelyBad 1GibberishNeverTerrible 77

Results on Human Evaluation Commentator English Fluency Semantic Correctness Sportscasting Ability Human Machine

Immediate Future Directions Use strategic generation information to improve resolution of ambiguous training data. Produce generation confidences (instead of NIST scores) for scoring NL/MR matches in WASPER- GEN. Improve WASP’s ability to handle noisy training data. Improve simulated perception to extract more detailed and interesting symbolic facts from the simulator.

Longer Term Future Directions Apply approach to learning situated language in a computer video-game environment (Gorniak & Roy, 2005) –Teach game AI’s how to talk to you! Apply approach to captioned images or video using computer vision to extract objects, relations, and events from real perceptual data (Fleischman & Roy, 2007)

81 Conclusions Current language learning work uses expensive, unrealistic training data. We have developed a language learning system that can learn from language paired with an ambiguous perceptual environment. We have evaluated it on the task of learning to sportscast simulated Robocup games. The system learns to sportscast almost as well as humans.