Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore A Generative Model for Parsing Natural Language to Meaning Representations Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore Luke S. Zettlemoyer Massachusetts Institute of Technology
Classic Goal of NLP: Understanding Natural Language Mapping Natural Language (NL) to Meaning Representations (MR) How many states do not have rivers ? … … … … … … … Natural Language Sentence Meaning Representation
Meaning Representation (MR) QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) STATE:state(all) STATE:loc_1(RIVER) RIVER:river(all) How many states do not have rivers ?
MR production Meaning representation production (MR production) Example: NUM:count(STATE) Semantic category: NUM Function symbol: count Child semantic category: STATE At most 2 child semantic categories
Task Description Training data: NL-MR pairs Input: A new NL sentence Output: An MR
Challenge Mapping of individual NL words to their associated MR productions is not given in the NL-MR pairs
Mapping Words to MR Productions QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) STATE:state(all) STATE:loc_1(RIVER) RIVER:river(all) how many states do not have rivers ? 7 7
Talk Outline Generative model Reranking Evaluation Goal: flexible model that can parse a wide range of input sentences Efficient algorithms for EM training and decoding In practice: correct output is often in top-k list, but is not always the best scoring option Reranking Global features Evaluation Generative model combined with reranking technique achieves state-of-the-art performance
STATE:exclude(STATE STATE) NL-MR Pair Hybrid Tree QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) Hybrid sequences STATE:state(all) STATE:loc_1(RIVER) RIVER:river(all) How many states do not have rivers ?
STATE:exclude(STATE STATE) Model Parameters QUERY:answer(NUM) w: the NL sentence m: the MR T: the hybrid tree NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers P(w,m,T) =P(QUERY:answer(NUM)|-,arg=1) *P(NUM ?|QUERY:answer(NUM)) *P(NUM:count(STATE)|QUERY:answer(NUM),arg=1) *P(How many STATE|NUM:count(STATE)) *P(STATE:exclude(STATE STATE)|NUM:count(STATE),arg=1) *P(STATE1 do not STATE2|STATE:exclude(STATE STATE)) *P(STATE:state(all)|STATE:exclude(STATE STATE),arg=1) *P(states|STATE:state(all)) *P(STATE:loc_1(RIVER)|STATE:exclude(STATE STATE),arg=2) *P(have RIVER|STATE:loc_1(RIVER)) *P(RIVER:river(all)|STATE:loc_1(RIVER),arg=1) *P(rivers|RIVER:river(all)) MR Model Parameters ρ(m’|m,arg=k)
STATE:exclude(STATE STATE) Model Parameters QUERY:answer(NUM) w: the NL sentence m: the MR T: the hybrid tree NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers P(How many STATE|NUM:count(STATE)) = P(mwY|NUM:count(STATE)) * P(How|NUM:count(STATE),BEGIN) * P(many|NUM:count(STATE),How) * P(STATE|NUM:count(STATE),many) * P(END|NUM:count(STATE),STATE) Pattern Parameters Φ(r|m)
Hybrid Patterns #RHS Hybrid Pattern # Patterns M w 1 M [w] Y [w] 4 M w 1 M [w] Y [w] 4 2 M [w] Y [w] Z [w] 8 M [w] Z [w] Y [w] M is an MR production, w is a word sequence Y and Z are respectively the first and second child MR production Note: [] denotes optional
STATE:exclude(STATE STATE) Model Parameters QUERY:answer(NUM) w: the NL sentence m: the MR T: the hybrid tree NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers P(How many STATE|NUM:count(STATE)) = P(mwY|NUM:count(STATE)) * P(How|NUM:count(STATE),BEGIN) * P(many|NUM:count(STATE),How) * P(STATE|NUM:count(STATE),many) * P(END|NUM:count(STATE),STATE) Emission Parameters θ(t|m,Λ)
Assumptions : Model I, II, III NUM:count(STATE) BEGIN How many STATE END Model I Model II Model III Θ(ti|M,Λ) = P(ti|M) Θ(ti|M,Λ) = P(ti|M,ti-1) Θ(ti|M,Λ) = [P(ti|M,ti-1) + P(ti|M)] * 0.5 Unigram Model Bigram Model Mixgram Model
Model Parameters MR model parameters Emission parameters Σmi ρ(mi|mj,arg=k) = 1 They model the meaning representation Emission parameters Σt Θ(t|mj,Λ) = 1 They model the emission of words and semantic categories of MR productions. Λ is the context. Pattern parameters Σr Φ(r|mj) = 1 They model the selection of hybrid patterns
Parameter Estimation MR model parameters are easy to estimate Learning the emission parameters and pattern parameters is challenging Inside-outside algorithm with EM Naïve implementation: O(n6m) n: number of words in an NL sentence m: number of MR productions in an MR Improved efficient algorithm Two-layer dynamic programming Improved time complexity: O(n3m)
Decoding Given an NL sentence w, find the optimal MR M*: M* = argmaxm P(m|w) = argmaxmΣT P(m,T |w) = argmaxmΣT P(w,m,T ) We find the most likely hybrid tree M* = argmaxmmaxT P(w,m,T ) Similar DP techniques employed Implemented Exact top-k decoding algorithm
Reranking Weakness of the generative model Lacks the ability to model long range dependencies Reranking with the averaged perceptron Output space Hybrid trees from exact top-k (k=50) decoding algorithm for each training/testing instance’s NL sentence Single correct reference Output of Viterbi algorithm for each training instance Feature functions Features 1-5 are indicator functions, while feature 6 is real-valued. Threshold b that prunes unreliable predictions even when they score the highest, to optimize F-measure
Reranking Features: Examples QUERY:answer(NUM) log(P(w,m,T)) NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers Feature 1: Hybrid Rule: A MR production and its child hybrid sequence Feature 2: Expanded Hybrid Rule: A MR production and its child hybrid sequence expanded Feature 3: Long-range Unigram: A MR production and a NL word appearing below in tree Feature 4: Grandchild Unigram: A MR production and its grandchild NL word Feature 5: Two Level Unigram: A MR production, its parent production, and its child NL word Feature 6: Model Log-Probability: Logarithm of base model’s joint probability
Related Work SILT (2005) by Kate, Wong, and Mooney A system that learns deterministic rules to transform either sentences or their syntactic parse trees to meaning structures WASP (2006) by Wong and Mooney A system motivated by statistical machine translation techniques KRISP (2006) by Kate and Mooney A discriminative approach where meaning representation structures are constructed from the natural language strings hierarchically
Evaluation Metrics Precision Recall F measure # correct output structures # output structures Recall # input sentences F measure 2 1/Precision + 1/Recall
Evaluations Comparison over three models I/II/III: Unigram/Bigram/Mixgram model; +R: w/ reranking Reranking is shown to be effective Overall, model III with reranking performs the best Model Geoquery (880) Robocup (300) Prec. Rec. F I 81.3 77.1 79.1 71.1 64.0 67.4 II 89.0 76.0 82.0 82.4 57.7 67.8 III 86.2 81.8 84.0 70.4 63.3 66.7 I + R 87.5 80.5 83.8 67.0 72.6 II + R 93.2 73.6 82.3 88.4 56.0 68.6 III + R 89.3 81.5 85.2 82.5 67.7 74.4
Evaluations Comparison with other models On Geoquery: Able to handle more than 25% of the inputs that could not be handled by previous systems Error reduction rate of 22% System Geoquery (880) Robocup (300) Prec. Rec. F SILT 89.0 54.1 67.3 83.9 50.7 63.2 WASP 87.2 74.8 80.5 88.9 61.9 73.0 KRISP 93.3 71.7 81.1 85.2 Model III + R 89.3 81.5 82.5 67.7 74.4
Evaluations Comparison on other languages Achieves performance comparable to previous system System English Spanish Prec. Rec. F WASP 95.42 70.00 80.76 91.99 72.40 81.03 Model III + R 91.46 72.80 81.07 95.19 79.20 86.46 System Japanese Turkish Prec. Rec. F WASP 91.98 74.40 82.86 96.96 62.40 75.93 Model III + R 87.56 76.00 81.37 93.82 66.80 78.04
Contributions Introduced a hybrid tree representation framework for this task Proposed a new generative model that can be applied to the task of transforming NL sentences to MRs Developed a new dynamic programming algorithm for efficient training and decoding The approach, augmented with reranking, achieves state-of-the-art performance on benchmark corpora, with a notable improvement in recall
Questions?