Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore

Slides:



Advertisements
Similar presentations
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Search-Based Structured Prediction
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning Semantic Parsers Using Statistical.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Statistical NLP: Lecture 11
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic Context Free Grammars Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Online Learning Algorithms
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Invitation to Computer Science 5th Edition
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Graphical models for part of speech tagging
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D. Final Defense Supervisor: Raymond J. Mooney Machine Learning.
Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
An Extended GHKM Algorithm for Inducing λ-SCFG Peng Li Tsinghua University.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
NTU & MSRA Ming-Feng Tsai
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Approaches to Machine Translation
Semantic Parsing for Question Answering
Using String-Kernels for Learning Semantic Parsers
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Learning to Sportscast: A Test of Grounded Language Acquisition
Approaches to Machine Translation
University of Illinois System in HOO Text Correction Shared Task
Chapter 10: Compilers and Language Translation
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore A Generative Model for Parsing Natural Language to Meaning Representations Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore Luke S. Zettlemoyer Massachusetts Institute of Technology

Classic Goal of NLP: Understanding Natural Language Mapping Natural Language (NL) to Meaning Representations (MR) How many states do not have rivers ? … … … … … … … Natural Language Sentence Meaning Representation

Meaning Representation (MR) QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) STATE:state(all) STATE:loc_1(RIVER) RIVER:river(all) How many states do not have rivers ?

MR production Meaning representation production (MR production) Example: NUM:count(STATE) Semantic category: NUM Function symbol: count Child semantic category: STATE At most 2 child semantic categories

Task Description Training data: NL-MR pairs Input: A new NL sentence Output: An MR

Challenge Mapping of individual NL words to their associated MR productions is not given in the NL-MR pairs

Mapping Words to MR Productions QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) STATE:state(all) STATE:loc_1(RIVER) RIVER:river(all) how many states do not have rivers ? 7 7

Talk Outline Generative model Reranking Evaluation Goal: flexible model that can parse a wide range of input sentences Efficient algorithms for EM training and decoding In practice: correct output is often in top-k list, but is not always the best scoring option Reranking Global features Evaluation Generative model combined with reranking technique achieves state-of-the-art performance

STATE:exclude(STATE STATE) NL-MR Pair Hybrid Tree QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) Hybrid sequences STATE:state(all) STATE:loc_1(RIVER) RIVER:river(all) How many states do not have rivers ?

STATE:exclude(STATE STATE) Model Parameters QUERY:answer(NUM) w: the NL sentence m: the MR T: the hybrid tree NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers P(w,m,T) =P(QUERY:answer(NUM)|-,arg=1) *P(NUM ?|QUERY:answer(NUM)) *P(NUM:count(STATE)|QUERY:answer(NUM),arg=1) *P(How many STATE|NUM:count(STATE)) *P(STATE:exclude(STATE STATE)|NUM:count(STATE),arg=1) *P(STATE1 do not STATE2|STATE:exclude(STATE STATE)) *P(STATE:state(all)|STATE:exclude(STATE STATE),arg=1) *P(states|STATE:state(all)) *P(STATE:loc_1(RIVER)|STATE:exclude(STATE STATE),arg=2) *P(have RIVER|STATE:loc_1(RIVER)) *P(RIVER:river(all)|STATE:loc_1(RIVER),arg=1) *P(rivers|RIVER:river(all)) MR Model Parameters ρ(m’|m,arg=k)

STATE:exclude(STATE STATE) Model Parameters QUERY:answer(NUM) w: the NL sentence m: the MR T: the hybrid tree NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers P(How many STATE|NUM:count(STATE)) = P(mwY|NUM:count(STATE)) * P(How|NUM:count(STATE),BEGIN) * P(many|NUM:count(STATE),How) * P(STATE|NUM:count(STATE),many) * P(END|NUM:count(STATE),STATE) Pattern Parameters Φ(r|m)

Hybrid Patterns #RHS Hybrid Pattern # Patterns M  w 1 M  [w] Y [w] 4 M  w 1 M  [w] Y [w] 4 2 M  [w] Y [w] Z [w] 8 M  [w] Z [w] Y [w] M is an MR production, w is a word sequence Y and Z are respectively the first and second child MR production Note: [] denotes optional

STATE:exclude(STATE STATE) Model Parameters QUERY:answer(NUM) w: the NL sentence m: the MR T: the hybrid tree NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers P(How many STATE|NUM:count(STATE)) = P(mwY|NUM:count(STATE)) * P(How|NUM:count(STATE),BEGIN) * P(many|NUM:count(STATE),How) * P(STATE|NUM:count(STATE),many) * P(END|NUM:count(STATE),STATE) Emission Parameters θ(t|m,Λ)

Assumptions : Model I, II, III NUM:count(STATE) BEGIN How many STATE END Model I Model II Model III Θ(ti|M,Λ) = P(ti|M) Θ(ti|M,Λ) = P(ti|M,ti-1) Θ(ti|M,Λ) = [P(ti|M,ti-1) + P(ti|M)] * 0.5 Unigram Model Bigram Model Mixgram Model

Model Parameters MR model parameters Emission parameters Σmi ρ(mi|mj,arg=k) = 1 They model the meaning representation Emission parameters Σt Θ(t|mj,Λ) = 1 They model the emission of words and semantic categories of MR productions. Λ is the context. Pattern parameters Σr Φ(r|mj) = 1 They model the selection of hybrid patterns

Parameter Estimation MR model parameters are easy to estimate Learning the emission parameters and pattern parameters is challenging Inside-outside algorithm with EM Naïve implementation: O(n6m) n: number of words in an NL sentence m: number of MR productions in an MR Improved efficient algorithm Two-layer dynamic programming Improved time complexity: O(n3m)

Decoding Given an NL sentence w, find the optimal MR M*: M* = argmaxm P(m|w) = argmaxmΣT P(m,T |w) = argmaxmΣT P(w,m,T ) We find the most likely hybrid tree M* = argmaxmmaxT P(w,m,T ) Similar DP techniques employed Implemented Exact top-k decoding algorithm

Reranking Weakness of the generative model Lacks the ability to model long range dependencies Reranking with the averaged perceptron Output space Hybrid trees from exact top-k (k=50) decoding algorithm for each training/testing instance’s NL sentence Single correct reference Output of Viterbi algorithm for each training instance Feature functions Features 1-5 are indicator functions, while feature 6 is real-valued. Threshold b that prunes unreliable predictions even when they score the highest, to optimize F-measure

Reranking Features: Examples QUERY:answer(NUM) log(P(w,m,T)) NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers Feature 1: Hybrid Rule: A MR production and its child hybrid sequence Feature 2: Expanded Hybrid Rule: A MR production and its child hybrid sequence expanded Feature 3: Long-range Unigram: A MR production and a NL word appearing below in tree Feature 4: Grandchild Unigram: A MR production and its grandchild NL word Feature 5: Two Level Unigram: A MR production, its parent production, and its child NL word Feature 6: Model Log-Probability: Logarithm of base model’s joint probability

Related Work SILT (2005) by Kate, Wong, and Mooney A system that learns deterministic rules to transform either sentences or their syntactic parse trees to meaning structures WASP (2006) by Wong and Mooney A system motivated by statistical machine translation techniques KRISP (2006) by Kate and Mooney A discriminative approach where meaning representation structures are constructed from the natural language strings hierarchically

Evaluation Metrics Precision Recall F measure # correct output structures # output structures Recall # input sentences F measure 2 1/Precision + 1/Recall

Evaluations Comparison over three models I/II/III: Unigram/Bigram/Mixgram model; +R: w/ reranking Reranking is shown to be effective Overall, model III with reranking performs the best Model Geoquery (880) Robocup (300) Prec. Rec. F I 81.3 77.1 79.1 71.1 64.0 67.4 II 89.0 76.0 82.0 82.4 57.7 67.8 III 86.2 81.8 84.0 70.4 63.3 66.7 I + R 87.5 80.5 83.8 67.0 72.6 II + R 93.2 73.6 82.3 88.4 56.0 68.6 III + R 89.3 81.5 85.2 82.5 67.7 74.4

Evaluations Comparison with other models On Geoquery: Able to handle more than 25% of the inputs that could not be handled by previous systems Error reduction rate of 22% System Geoquery (880) Robocup (300) Prec. Rec. F SILT 89.0 54.1 67.3 83.9 50.7 63.2 WASP 87.2 74.8 80.5 88.9 61.9 73.0 KRISP 93.3 71.7 81.1 85.2 Model III + R 89.3 81.5 82.5 67.7 74.4

Evaluations Comparison on other languages Achieves performance comparable to previous system System English Spanish Prec. Rec. F WASP 95.42 70.00 80.76 91.99 72.40 81.03 Model III + R 91.46 72.80 81.07 95.19 79.20 86.46 System Japanese Turkish Prec. Rec. F WASP 91.98 74.40 82.86 96.96 62.40 75.93 Model III + R 87.56 76.00 81.37 93.82 66.80 78.04

Contributions Introduced a hybrid tree representation framework for this task Proposed a new generative model that can be applied to the task of transforming NL sentences to MRs Developed a new dynamic programming algorithm for efficient training and decoding The approach, augmented with reranking, achieves state-of-the-art performance on benchmark corpora, with a notable improvement in recall

Questions?