1 Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Rohit Kate Yuk Wah Wong
Current State of Natural Language Learning Most current state-of-the-art NLP systems are constructed by training on large supervised corpora. –Syntactic Parsing: Penn Treebank –Word Sense Disambiguation: SenseEval –Semantic Role Labeling: Propbank –Machine Translation: Hansards corpus Constructing such annotated corpora is difficult, expensive, and time consuming. 2
3 Semantic Parsing A semantic parser maps a natural-language sentence to a complete, detailed semantic representation: logical form or meaning representation (MR). For many applications, the desired output is immediately executable by another program. Two application domains: –GeoQuery: A Database Query Application –CLang: RoboCup Coach Language
4 GeoQuery: A Database Query Application Query application for U.S. geography database [Zelle & Mooney, 1996] User How many states does the Mississippi run through? Query answer(A, count(B, (state(B), C=riverid(mississippi), traverse(C,B)), A)) Semantic Parsing DataBase 10
5 CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated soccer players The coaching instructions are given in a formal language called CLang Simulated soccer field Coach If the ball is in our penalty area, then all our players except player 4 should stay in our half. CLang ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) Semantic Parsing
6 Learning Semantic Parsers Manually programming robust semantic parsers is difficult due to the complexity of the task. Semantic parsers can be learned automatically from sentences paired with their logical form. NL MR Training Exs Semantic-Parser Learner Natural Language Meaning Rep Semantic Parser
7 Our Semantic-Parser Learners CHILL+WOLFIE (Zelle & Mooney, 1996; Thompson & Mooney, 1999, 2003) –Separates parser-learning and semantic-lexicon learning. –Learns a deterministic parser using ILP techniques. COCKTAIL (Tang & Mooney, 2001) –Improved ILP algorithm for CHILL. SILT (Kate, Wong & Mooney, 2005) –Learns symbolic transformation rules for mapping directly from NL to MR. SCISSOR (Ge & Mooney, 2005) –Integrates semantic interpretation into Collins’ statistical syntactic parser. WASP (Wong & Mooney, 2006; 2007) –Uses syntax-based statistical machine translation methods. KRISP (Kate & Mooney, 2006) –Uses a series of SVM classifiers employing a string-kernel to iteratively build semantic representations.
8 WASP A Machine Translation Approach to Semantic Parsing Uses statistical machine translation techniques –Synchronous context-free grammars (SCFG) (Wu, 1997; Melamed, 2004; Chiang, 2005) –Word alignments (Brown et al., 1993; Och & Ney, 2003) Hence the name: Word Alignment-based Semantic Parsing
9 A Unifying Framework for Parsing and Generation Natural Languages Machine translation
10 A Unifying Framework for Parsing and Generation Natural Languages Formal Languages Semantic parsing Machine translation
11 A Unifying Framework for Parsing and Generation Natural Languages Formal Languages Semantic parsing Tactical generation Machine translation
12 A Unifying Framework for Parsing and Generation Natural Languages Formal Languages Semantic parsing Tactical generation Machine translation Synchronous Parsing
13 A Unifying Framework for Parsing and Generation Natural Languages Formal Languages Semantic parsing Tactical generation Machine translation Compiling: Aho & Ullman (1972) Synchronous Parsing
14 Synchronous Context-Free Grammars (SCFG) Developed by Aho & Ullman (1972) as a theory of compilers that combines syntax analysis and code generation in a single phase. Generates a pair of strings in a single derivation.
15 QUERY What is CITY / answer(CITY) Synchronous Context-Free Grammar Production Rule Natural languageFormal language
16 STATE Ohio / stateid('ohio') QUERY What is CITY / answer(CITY) CITY the capital CITY / capital(CITY) CITY of STATE / loc_2(STATE) What is the capital of Ohio Synchronous Context-Free Grammar Derivation Ohio of STATE QUERY CITY What is QUERY answer ( CITY ) capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) answer(capital(loc_2(stateid('ohio')))) CITY the capital
17 Probabilistic Parsing Model Ohio of STATE CITY capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) capital CITY STATE Ohio / stateid('ohio') CITY capital CITY / capital(CITY) CITY of STATE / loc_2(STATE) d1d1
18 Probabilistic Parsing Model Ohio of RIVER CITY capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) capital CITY RIVER Ohio / riverid('ohio') CITY capital CITY / capital(CITY) CITY of RIVER / loc_2(RIVER) d2d2
19 CITY capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) Probabilistic Parsing Model CITY capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) STATE Ohio / stateid('ohio') CITY capital CITY / capital(CITY) CITY of STATE / loc_2(STATE) RIVER Ohio / riverid('ohio') CITY capital CITY / capital(CITY) CITY of RIVER / loc_2(RIVER) λλ Pr(d 1 |capital of Ohio) = exp( ) / ZPr(d 2 |capital of Ohio) = exp( ) / Z d1d1 d2d2 normalization constant
20 Overview of WASP Lexical acquisition Parameter estimation Semantic parsing Unambiguous CFG of MRL Training set, {(e,f)} Lexicon, L (an SCFG) SCFG parameterized by λ Input sentence, e' Output MR, f' Training Testing
21 Tactical Generation Can be seen as inverse of semantic parsing ((true) (do our {1} (pos (half our)))) The goalie should always stay in our half Semantic parsing Tactical generation
22 Tactical generation: Generation by Inverting WASP Same synchronous grammar is used for both generation and semantic parsing. QUERY What is CITY / answer(CITY) NL:MRL: InputOutput Semantic parsing:
23 Learning Language from Perceptual Context Children do not learn language from annotated corpora. Neither do they learn language from just reading the newspaper, surfing the web, or listening to the radio. The natural way to learn language is to perceive language in the context of its use in the physical and social world. This requires inferring the meaning of utterances from their perceptual context.
24 Language Grounding The meanings of many words are grounded in our perception of the physical world: red, ball, cup, run, hit, fall, etc. –Symbol Grounding: Harnad (1990) Even many abstract words and meanings are metaphorical abstractions of terms grounded in the physical world: up, down, over, in, etc. –Lakoff and Johnson’s Metaphors We Live By Its difficult to put my words into ideas. Interest in competitions is up. Most work in NLP tries to represent meaning without any connection to perception or to the physical world; circularly defining the meanings of words in terms of other words or meaningless symbols with no firm foundation.
25 ??? “Mary is on the phone”
Ambiguous Supervision for Learning Semantic Parsers A computer system simultaneously exposed to perceptual contexts and natural language utterances should be able to learn the underlying language semantics. We consider ambiguous training data of sentences associated with multiple potential MRs. –Siskind (1996) uses this type “referentially uncertain” training data to learn meanings of words. Extracting meaning representations from perceptual data is a difficult unsolved problem. –Our system directly works with symbolic MRs.
27 “Mary is on the phone” ???
28 “Mary is on the phone” ???
29 Ironing(Mommy, Shirt) “Mary is on the phone” ???
30 Ironing(Mommy, Shirt) Working(Sister, Computer) “Mary is on the phone” ???
31 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) “Mary is on the phone” ???
32 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) Talking(Mary, Phone) Sitting(Mary, Chair) “Mary is on the phone” Ambiguous Training Example ???
33 Ironing(Mommy, Shirt) Working(Sister, Computer) Talking(Mary, Phone) Sitting(Mary, Chair) “Mommy is ironing a shirt” Next Ambiguous Training Example ???
Ambiguous Supervision for Learning Semantic Parsers contd. Our model of ambiguous supervision corresponds to the type of data that will be gathered from a temporal sequence of perceptual contexts with occasional language commentary. We assume each sentence has exactly one meaning in its perceptual context. –Recently extended to handle sentences with no meaning in its perceptual context. Each meaning is associated with at most one sentence.
35 Sample Ambiguous Corpus Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog)) Forms a bipartite graph
KRISPER: KRISP with EM-like Retraining Extension of K RISP that learns from ambiguous supervision. Uses an iterative EM-like method to gradually converge on a correct meaning for each sentence.
37 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 1. Assume every possible meaning for a sentence is correct
38 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 1. Assume every possible meaning for a sentence is correct
39 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 2. Resulting NL-MR pairs are weighted and given to K RISP 1/2 1/4 1/5 1/3
40 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 3. Estimate the confidence of each NL-MR pair using the resulting trained parser
41 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]
42 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]
43 saw(john, walks(man, dog)) KRISPER’s Training Algorithm Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) 5. Give the best pairs to K RISP in the next iteration, and repeat until convergence
Results on Ambig-ChildWorld Corpus
45 New Challenge: Learning to Be a Sportscaster Goal: Learn from realistic data of natural language used in a representative context while avoiding difficult issues in computer perception (i.e. speech and vision). Solution: Learn from textually annotated traces of activity in a simulated environment. Example: Traces of games in the Robocup simulator paired with textual sportscaster commentary.
46 Grounded Language Learning in Robocup Robocup Simulator Sportscaster Simulated Perception Perceived Facts Score!!!! Grounded Language Learner Language Generator Semantic Parser SCFG Score!!!!
47 Robocup Sportscaster Trace purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) Natural Language CommentaryMeaning Representation
48 Robocup Sportscaster Trace purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) Natural Language CommentaryMeaning Representation
49 Robocup Sportscaster Trace purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) Natural Language CommentaryMeaning Representation
Sportscasting Data Collected human textual commentary for the 4 Robocup championship games from –Avg # events/game = 2,613 –Avg # sentences/game = 509 Each sentence matched to all events within previous 5 seconds. –Avg # MRs/sentence = 2.5 (min 1, max 12) Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only). 50
WASPER WASP with EM-like retraining to handle ambiguous training data. Same augmentation as added to KRISP to create KRISPER. 51
KRISPER-WASP First iteration of EM-like training produces very noisy training data (> 50% errors). KRISP is better than WASP at handling noisy training data. –SVM prevents overfitting. –String kernel allows partial matching. But KRISP does not support language generation. First train KRISPER just to determine the best NL→MR matchings. Then train WASP on the resulting unambiguously supervised data. 52
WASPER-GEN In KRISPER and WASPER, the correct MR for each sentence is chosen based on maximizing the confidence of semantic parsing (NL→MR). Instead, WASPER-GEN determines the best matching based on generation (MR→NL). Score each potential NL/MR pair by using the currently trained WASP -1 generator. Compute NIST MT score (alternative to BLEU score) between the generated sentence and the potential matching sentence. 53
Strategic Generation Generation requires not only knowing how to say something (tactical generation) but also what to say (strategic generation). For automated sportscasting, one must be able to effectively choose which events to describe. 54
pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) Example of Strategic Generation 55
Example of Strategic Generation 56 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 )
Learning for Strategic Generation For each event type (e.g. pass, kick) estimate the probability that it is described by the sportscaster. Requires NL/MR matching that indicates which events were described, but this is not provided in the ambiguous training data. –Use estimated matching computed by KRISPER, WASPER or WASPER-GEN. –Use a version of EM to determine the probability of mentioning each event type just based on strategic info. 57
EM for Strategic Generation 58 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) 1 1/4 1/5
EM for Strategic Generation 59 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) 1 1/4 1/5 P(pass)=(1+1/4+1/4+1/4+1/5)/3=0.65 Estimate Generation Probs
EM for Strategic Generation 60 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) 1 1/4 1/5 P(pass)=0.65P(ballstopped)=(1/4+1/4)/2=0.25 Estimate Generation Probs
EM for Strategic Generation 61 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) 1 1/4 1/5 P(pass)=0.65P(ballstopped)=0.25P(kick)=(1/4+1/4+1/5+1/5)/3=0.3 Estimate Generation Probs
EM for Strategic Generation 62 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) 1 1/4 1/5 P(pass)=0.65P(ballstopped)=0.25P(kick)=0.3P(badpass)=0.2 P(turnover)=0.2 Estimate Generation Probs
EM for Strategic Generation 63 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) P(pass)=0.65P(ballstopped)=0.25P(kick)=0.3P(badpass)=0.2 P(turnover)=0.2 Reassign link weights
EM for Strategic Generation 64 purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) P(pass)=0.65P(ballstopped)=0.25P(kick)=0.3P(badpass)=0.2 P(turnover)=0.2 Normalize link weights
EM for Strategic Generation purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badPass ( purple3, pink9 ) turnover ( purple3, pink9 ) Recalculate Generation Probs and Repeat Until Convergence
Demo Game clip commentated using WASPER- GEN with EM-based strategic generation, since this gave the best results for generation. FreeTTS was used to synthesize speech from textual output.
Experimental Evaluation Generated learning curves by training on all combinations of 1 to 3 games and testing on all games not used for training. Baselines: –Random Matching: WASP trained on random choice of possible MR for each comment. –Gold Matching: WASP trained on correct matching of MR for each comment. Metrics: –Precision: % of system’s annotations that are correct –Recall: % of gold-standard annotations correctly produced –F-measure: Harmonic mean of precision and recall
Evaluating Matching Accuracy Measure how accurately various methods assign MRs to sentences in the ambiguous training data. Use gold-standard matches to evaluate correctness.
Results on Matching
Evaluating Semantic Parsing Measure how accurately learned parser maps sentences to their correct meanings in the test games. Use the gold-standard matches to determine the correct MR for each sentence that has one. Generated MR must exactly match gold- standard to count as correct.
Results on Semantic Parsing
Evaluating Tactical Generation Measure how accurately NL generator produces English sentences for chosen MRs in the test games. Use gold-standard matches to determine the correct sentence for each MR that has one. Use NIST score to compare generated sentence to the one in the gold-standard.
Results on Tactical Generation
Evaluating Strategic Generation In the test games, measure how accurately the system determines which perceived events to comment on. Compare the subset of events chosen by the system to the subset chosen by the human annotator (as given by the gold-standard matching).
Results on Strategic Generation
Human Evaluation (Quasi Turing Test) Asked 4 fluent English speakers to evaluate overall quality of sportscasts. Randomly picked a 2 minute segment from each of the 4 games. Each human judge evaluated 8 commented game clips, each of the 4 segments commented once by a human and once by the machine when tested on that game. The 8 clips presented to each judge were shown in random counter-balanced order. Judges were not told which ones were human or machine generated. 76
Human Evaluation Metrics Score English Fluency Semantic Correctness Sportscasting Ability 5FlawlessAlwaysExcellent 4GoodUsuallyGood 3Non-nativeSometimesAverage 2DisfluentRarelyBad 1GibberishNeverTerrible 77
Results on Human Evaluation Commentator English Fluency Semantic Correctness Sportscasting Ability Human Machine
Immediate Future Directions Use strategic generation information to improve resolution of ambiguous training data. Produce generation confidences (instead of NIST scores) for scoring NL/MR matches in WASPER- GEN. Improve WASP’s ability to handle noisy training data. Improve simulated perception to extract more detailed and interesting symbolic facts from the simulator.
Longer Term Future Directions Apply approach to learning situated language in a computer video-game environment (Gorniak & Roy, 2005) –Teach game AI’s how to talk to you! Apply approach to captioned images or video using computer vision to extract objects, relations, and events from real perceptual data (Fleischman & Roy, 2007)
81 Conclusions Current language learning work uses expensive, unrealistic training data. We have developed a language learning system that can learn from language paired with an ambiguous perceptual environment. We have evaluated it on the task of learning to sportscast simulated Robocup games. The system learns to sportscast almost as well as humans.