Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

Similar presentations


Presentation on theme: "1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition."— Presentation transcript:

1 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition

2 2 Motivation Constructing annotated corpora for language learning is difficult Children acquire language through exposure to linguistic input in the context of a rich, relevant, perceptual environment

3 3 Goals Learn to ground the semantics of language Block Learn language through correlated linguistic and visual inputs

4 4 Challenge

5 5

6 6 A linguistic input may correspond to many possible events Block ? ? ?

7 7 Overview Sportscasting task Tactical generation Strategic generation Human evaluation

8 8 Learning to Sportscast Robocup Simulation League games No speech recognition –Record commentaries in text form No computer vision –Ruled-based system to automatically extract game events in symbolic form Concentrate on linguistic issues

9 9 Robocup Simulation League

10 10 Robocup Simulation League Pink4’s pass was intercepted by Purple6

11 11 Learning to Sportscast Learn to sportscast by observing sample human sportscasts Build a function that maps between natural language (NL) and meaning representation (MR) –NL: Textual commentaries about the game –MR: Predicate logic formulas that represent events in the game

12 12 Mapping between NL/MR NL: “Purple3 passes the ball to Purple5” MR: Pass ( Purple3, Purple5 ) Semantic Parsing (NL  MR) Tactical Generation (MR  NL)

13 13 Robocup Sportscaster Trace Natural Language CommentaryMeaning Representation Purple goalie turns the ball over to Pink8 badPass ( Purple1, Pink8 ) Pink11 looks around for a teammate Pink8 passes the ball to Pink11 Purple team is very sloppy today Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 turnover ( Purple1, Pink8 ) pass ( Pink11, Pink8 ) pass ( Pink8, Pink11 ) ballstopped pass ( Pink8, Pink11 ) kick ( Pink11 ) kick ( Pink8) kick ( Pink11 ) kick ( Pink8 )

14 14 Robocup Sportscaster Trace Natural Language CommentaryMeaning Representation Purple goalie turns the ball over to Pink8 badPass ( Purple1, Pink8 ) Pink11 looks around for a teammate Pink8 passes the ball to Pink11 Purple team is very sloppy today Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 turnover ( Purple1, Pink8 ) pass ( Pink11, Pink8 ) pass ( Pink8, Pink11 ) ballstopped pass ( Pink8, Pink11 ) kick ( Pink11 ) kick ( Pink8) kick ( Pink11 ) kick ( Pink8 )

15 15 Robocup Sportscaster Trace Natural Language CommentaryMeaning Representation Purple goalie turns the ball over to Pink8 badPass ( Purple1, Pink8 ) Pink11 looks around for a teammate Pink8 passes the ball to Pink11 Purple team is very sloppy today Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 turnover ( Purple1, Pink8 ) pass ( Pink11, Pink8 ) pass ( Pink8, Pink11 ) ballstopped pass ( Pink8, Pink11 ) kick ( Pink11 ) kick ( Pink8) kick ( Pink11 ) kick ( Pink8 )

16 16 Robocup Sportscaster Trace Natural Language CommentaryMeaning Representation Purple goalie turns the ball over to Pink8 P6 ( C1, C19 ) Pink11 looks around for a teammate Pink8 passes the ball to Pink11 Purple team is very sloppy today Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 P5 ( C1, C19 ) P2 ( C22, C19 ) P2 ( C19, C22 ) P0 P2 ( C19, C22 ) P1 ( C22 ) P1( C19 ) P1 ( C22 ) P1 ( C19 )

17 17 Robocup Data Collected human textual commentary for the 4 Robocup championship games from 2001-2004. –Avg # events/game = 2,613 –Avg # sentences/game = 509 Each sentence matched to all events within previous 5 seconds. –Avg # MRs/sentence = 2.5 (min 1, max 12) Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only).

18 18 Overview Sportscasting task Tactical generation Strategic generation Human evaluation

19 19 Tactical Generation Learn how to generate NL from MR Example: Two steps 1.Disambiguate the training data 2.Learn a language generator Pass(Pink2, Pink3)  “Pink2 kicks the ball to Pink3”

20 20 System Overview Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( Purple5, Purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 )

21 21 System Overview Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( Purple5, Purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 ) Semantic Parser Learner Initial Semantic Parser

22 22 System Overview Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( purple5, purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 ) Initial Semantic Parser Purple7 loses the ball to Pink2 Unambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Kick ( pink8 ) Pass ( pink2, pink5 ) Kick ( pink2 ) Kick ( pink5 )

23 23 System Overview Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( purple5, purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 ) Semantic Parser Purple7 loses the ball to Pink2 Unambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Kick ( pink8 ) Pass ( pink2, pink5 ) Kick ( pink2 ) Kick ( pink5 ) Semantic Parser Learner

24 24 System Overview Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( purple5, purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 ) Semantic Parser Purple7 loses the ball to Pink2 Unambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Kick ( pink8 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Semantic Parser Learner Turnover ( purple7, pink2 )

25 25 System Overview Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( purple5, purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 ) Semantic Parser Purple7 loses the ball to Pink2 Unambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Kick ( pink8 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Semantic Parser Learner Turnover ( purple7, pink2 )

26 26 System Overview Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( purple5, purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 ) Semantic Parser Purple7 loses the ball to Pink2 Unambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Kick ( pink8 ) Pass ( pink2, pink5 ) Semantic Parser Learner Turnover ( purple7, pink2 ) Pass ( pink5, pink8)

27 27 Semantic Parser Learners Learn a function from NL to MR NL: “Purple3 passes the ball to Purple5” MR: Pass ( Purple3, Purple5 ) Semantic Parsing (NL  MR) Tactical Generation (MR  NL) We experiment with two semantic parser learners –WASP (Wong & Mooney, 2006; 2007) –KRISP (Kate & Mooney, 2006)

28 28 WASP: Word Alignment-based Semantic Parsing Uses statistical machine translation techniques –Synchronous context-free grammars (SCFG) (Wu, 1997; Melamed, 2004; Chiang, 2005) –Word alignments (Brown et al., 1993; Och & Ney, 2003) Capable of both semantic parsing and tactical generation

29 29 KRISP: Kernel-based Robust Interpretation by Semantic Parsing Productions of MR language are treated like semantic concepts SVM classifier is trained for each production with string subsequence kernel These classifiers are used to compositionally build MRs of the sentences More resistant to noisy supervision but incapable of tactical generation

30 30 Matching Ability to find correct NL/MR pair 4 Robocup championship games from 2001-2004. –Avg # events/game = 2,613 –Avg # sentences/game = 509 Leave-one-game-out cross-validation Metric: –Precision: % of system’s annotations that are correct –Recall: % of gold-standard annotations correctly produced –F-measure: Harmonic mean of precision and recall

31 31 Systems Learner KRISPER (Kate & Mooney, 2007) KRISP WASPERWASP WASPER-GENWASP’s language generator

32 32 KRISPER and WASPER Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( purple5, purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 ) Semantic Parser Purple7 loses the ball to Pink2 Unambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Kick ( pink8 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Semantic Parser Learner (KRISP/WASP) Turnover ( purple7, pink2 )

33 33 Systems Learner KRISPER (Kate & Mooney, 2007) KRISP WASPERWASP WASPER-GENWASP’s language generator

34 34 WASPER-GEN Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( purple5, purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 ) Tactical Generator Purple7 loses the ball to Pink2 Unambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Kick ( pink8 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Tactical Generator Learner (WASP) Turnover ( purple7, pink2 )

35 35 Matching Results

36 36 Overview Sportscasting task Tactical generation Strategic generation Human evaluation

37 37 Strategic Generation Generation requires not only knowing how to say something (tactical generation) but also what to say (strategic generation). For automated sportscasting, one must be able to effectively choose which events to describe.

38 38 Example of Strategic Generation pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 )

39 39 Example of Strategic Generation pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 )

40 40 Strategic Generation For each event type (e.g. pass, kick) estimate the probability that it is described by the sportscaster. Requires correct NL/MR matching –Use estimated matching from tactical generation –Iterative Generation Strategy Learning

41 41 Iterative Generation Strategy Learning (IGSL) Directly estimates the likelihood of an event being commented on Self-training iterations to improve estimates Uses events not associated with any NL as negative evidence

42 42 Strategic Generation Performance Evaluate how well the system can predict which events a human comments on Metric: –Precision: % of system’s annotations that are correct –Recall: % of gold-standard annotations correctly produced –F-measure: Harmonic mean of precision and recall

43 43 Strategic Generation Results

44 44 Overview Sportscasting task Tactical generation Strategic generation Human evaluation

45 45 4 fluent English speakers as judges 8 commented game clips –2 minute clips randomly selected from each of the 4 games –Each clip commented once by a human, and once by the machine Presented in random counter-balanced order Judges were not told which ones were human or machine generated Human Evaluation (Quasi Turing Test)

46 46 Demo Clip Game clip commentated using WASPER- GEN with IGSL, since this gave the best results for generation. FreeTTS was used to synthesize speech from textual output.

47 47 Human Evaluation Score English Fluency Semantic Correctness Sportscasting Ability 5FlawlessAlwaysExcellent 4GoodUsuallyGood 3Non-nativeSometimesAverage 2DisfluentRarelyBad 1GibberishNeverTerrible

48 48 Human Evaluation Commentator English Fluency Semantic Correctness Sportscasting Ability Human3.944.253.63 Machine3.443.562.94 Difference0.50.69 Score English Fluency Semantic Correctness Sportscasting Ability 5FlawlessAlwaysExcellent 4GoodUsuallyGood 3Non-nativeSometimesAverage 2DisfluentRarelyBad 1GibberishNeverTerrible

49 49 Future Work Expand MRs to beyond simple logic formulas Apply approach to learning situated language in a computer video-game environment (Gorniak & Roy, 2005) Apply approach to captioned images or video using computer vision to extract objects, relations, and events from real perceptual data (Fleischman & Roy, 2007)

50 50 Conclusion Current language learning work uses expensive, unrealistic training data. We have developed a language learning system that can learn from language paired with an ambiguous perceptual environment. We have evaluated it on the task of learning to sportscast simulated Robocup games. The system learns to sportscast almost as well as humans.

51 51 Backup Slides

52 52 Systems DisambiguationLearning language generator WASPRandomWASP KRISPER (Kate & Mooney, 2007) KRISPN/A WASPERWASP KRISPER-WASPKRISPWASP WASPER-GENWASP’s language generator WASP WASP with gold matching N/AWASP

53 53 DisambiguationLearning language generator WASPRandomWASP KRISPER (Kate & Mooney, 2007) KRISPN/A WASPERWASP KRISPER-WASPKRISPWASP WASPER-GENWASP’s language generator WASP WASP with gold matching N/AWASP Lower baseline Upper baseline Systems

54 54 KRISPER and WASPER Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( purple5, purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 ) Semantic Parser Purple7 loses the ball to Pink2 Unambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Kick ( pink8 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Semantic Parser Learner (KRISP/WASP) Turnover ( purple7, pink2 )

55 55 DisambiguationLearning language generator WASPRandomWASP KRISPER (Kate & Mooney, 2007) KRISPN/A WASPERWASP KRISPER-WASPKRISPWASP WASPER-GENWASP’s language generator WASP WASP with gold matching N/AWASP Lower baseline Upper baseline Systems

56 56 WASPER-GEN Purple7 loses the ball to Pink2 SportscasterRobocup Simulator Ambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Turnover ( purple7, pink2 ) Pass ( pink5, pink8) Pass ( purple5, purple7 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Ballstopped Kick ( pink8 ) Tactical Generator Purple7 loses the ball to Pink2 Unambiguous Training Data Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Kick ( pink8 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Tactical Generator Learner (WASP) Turnover ( purple7, pink2 )

57 57 DisambiguationLearning language generator WASPRandomWASP KRISPER (Kate & Mooney, 2007) KRISPN/A WASPERWASP KRISPER-WASPKRISPWASP WASPER-GENWASP’s language generator WASP WASP with gold matching N/AWASP Lower baseline Upper baseline Systems

58 58 DisambiguationLearning language generator WASPRandomWASP KRISPER (Kate & Mooney, 2007) KRISPN/A WASPERWASP KRISPER-WASPKRISPWASP WASPER-GENWASP’s language generator WASP WASP with gold matching N/AWASP Lower baseline Upper baseline Systems Matching

59 59 Matching 4 Robocup championship games from 2001-2004. –Avg # events/game = 2,613 –Avg # sentences/game = 509 Leave-one-game-out cross-validation Metric: –Precision: % of system’s annotations that are correct –Recall: % of gold-standard annotations correctly produced –F-measure: Harmonic mean of precision and recall

60 60 Matching Results

61 61 DisambiguationLearning language generator WASPRandomWASP KRISPER (Kate & Mooney, 2007) KRISPN/A WASPERWASP KRISPER-WASPKRISPWASP WASPER-GENWASP’s language generator WASP WASP with gold matching N/AWASP Lower baseline Upper baseline Systems

62 62 DisambiguationLearning language generator WASPRandomWASP KRISPER (Kate & Mooney, 2007) KRISPN/A WASPERWASP KRISPER-WASPKRISPWASP WASPER-GENWASP’s language generator WASP WASP with gold matching N/AWASP Lower baseline Upper baseline Systems Tactical Generation

63 63 Tactical Generation 4 Robocup championship games from 2001-2004. –Avg # events/game = 2,613 –Avg # sentences/game = 509 Leave-one-game-out cross-validation NIST score –Evaluate the quality of machine translations based on matching n-grams –BLEU metric with some modifications –More weight for rarer n-grams –Less sensitive to translation length

64 64 Tactical Generation Results

65 65 Parsing Results

66 66 Matching Results

67 67 Parse Results

68 68 Generation Results

69 69 Strategic Generation Results

70 70 Grounded Language Learning in Robocup Robocup Simulator Sportscaster Simulated Perception Perceived Facts Score!!!! Grounded Language Learner Language Generator Semantic Parser SCFG Score!!!!

71 71 “Mary is on the phone” ???

72 72 “Mary is on the phone” ???

73 73 “Mary is on the phone” ??? Ironing(Mommy, Shirt)

74 74 “Mary is on the phone” ??? Ironing(Mommy, Shirt) Working(Sister, Computer)

75 75 “Mary is on the phone” ??? Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag)

76 76 “Mary is on the phone” ??? Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) Talking(Mary, Phone) Sitting(Mary, Chair)

77 77 Grounding Language “Spanish goalkeeper Iker Casillas blocks the ball”

78 78 Grounding Language “Spanish goalkeeper Iker Casillas blocks the ball”

79 79 Grounding Language Blocks WordNet - (v) parry, block, deflect (impede the movement of (an opponent or a ball)) "block an attack“ Merriam-Webster - intransitive verb: to block an opponent in sports

80 80 Grounding Language Blocks WordNet - (v) parry, block, deflect (impede the movement of (an opponent or a ball)) "block an attack“ Merriam-Webster - intransitive verb: to block an opponent in sports

81 81 Grounding Language Blocks

82 82 Robocup Sportscaster Trace purple6 passes to purple2 purple3 loses the ball to pink9 purple2 makes a short pass to purple3 ballstopped kick ( purple6 ) pass ( purple6, purple2 ) turnover ( purple3, pink9 ) kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) Natural Language CommentaryMeaning Representation

83 83 Robocup Sportscaster Trace purple6 passes to purple2 purple3 loses the ball to pink9 purple2 makes a short pass to purple3 pass ( purple6, purple2 ) pass ( purple2, purple3 ) Natural Language CommentaryMeaning Representation

84 84 Robocup Sportscaster Trace purple6 passes to purple2 purple3 loses the ball to pink9 purple2 makes a short pass to purple3 kick ( purple6 ) kick ( purple2 ) kick ( purple3 ) Natural Language CommentaryMeaning Representation

85 85 Robocup Sportscaster Trace purple6 passes to purple2 purple3 loses the ball to pink9 purple2 makes a short pass to purple3 kick ( purple 3 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) kick ( purple2 ) turnover ( purple3, pink9 ) kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) kick (purple 3 Natural Language CommentaryMeaning Representation

86 86 Robocup Sportscaster Trace purple6 passes to purple2 purple3 loses the ball to pink9 purple2 makes a short pass to purple3 kick ( purple 3 ) kick ( purple6 ) kick ( purple2 ) kick ( purple3 ) kick (purple 3 Natural Language CommentaryMeaning Representation

87 87 Robocup Sportscaster Trace purple6 passes to purple2 purple3 loses the ball to pink9 purple2 makes a short pass to purple3 kick ( purple 3 ) kick ( purple6 ) kick ( purple2 ) kick ( purple3 ) Natural Language CommentaryMeaning Representation Negative Evidence


Download ppt "1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition."

Similar presentations


Ads by Google