Unsupervised Pretraining for Semantic Parsing

Slides:



Advertisements
Similar presentations
Semantic Matching of candidates’ profile with job data from Linkedln PRESENTED BY: TING XIAO SARABPREET KAUR DHILLON.
Advertisements

Distributed Representations of Sentences and Documents
Authors : Ramon F. Astudillo, Silvio Amir, Wang Lin, Mario Silva, Isabel Trancoso Learning Word Representations from Scarce Data By: Aadil Hayat (13002)
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Convolutional LSTM Networks for Subcellular Localization of Proteins
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer
Naifan Zhuang, Jun Ye, Kien A. Hua
Brief Intro to Machine Learning CS539
Attention Model in NLP Jichuan ZENG.
Big data classification using neural network
Convolutional Sequence to Sequence Learning
Unsupervised Learning of Video Representations using LSTMs
Deep Learning for Dual-Energy X-Ray
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Learning Amin Sobhani.
Compact Bilinear Pooling
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Neural Machine Translation by Jointly Learning to Align and Translate
Attention Is All You Need
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Intro to NLP and Deep Learning
Joint Training for Pivot-based Neural Machine Translation
Intro to NLP and Deep Learning
Natural Language Processing of Knee MRI Reports
Are End-to-end Systems the Ultimate Solutions for NLP?
Efficient Estimation of Word Representation in Vector Space
Convolutional Neural Networks for sentence classification
Grid Long Short-Term Memory
Exploring Matching Networks for Cross Language Information Retrieval
KNOWLEDGE REPRESENTATION
MATERIAL Resources for Cross-Lingual Information Retrieval
Introduction to Machine Reading Comprehension
Outline Background Motivation Proposed Model Experimental Results
Resource Recommendation for AAN
AST Based Sequence to Sequence Natural Language Question to SQL Method
Domain Mixing for Chinese-English Neural Machine Translation
Dialogue State Tracking & Dialogue Corpus Survey
Part of Speech Tagging with Neural Architecture Search
Natural Language to SQL(nl2sql)
Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova
Independent Project Natural Language to SQL
Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Word embeddings (continued)
Sequential Question Answering Using Neural Coreference Resolution
Word Embedding 모든 단어를 vector로 표시 Word vector Word embedding Word
Deep Learning for the Soft Cutoff Problem
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Natural Language Processing (NLP) Systems Joseph E. Gonzalez
Attention for translation
-- Ray Mooney, Association for Computational Linguistics (ACL) 2014
Deep Interest Network for Click-Through Rate Prediction
Sequence to Sequence Video to Text
Automatic Handwriting Generation
Keshav Balasubramanian
Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.
Neural Machine Translation using CNN
Presented By: Harshul Gupta
Baseline Model CSV Files Pandas DataFrame Sentence Lists
Week 3 Presentation Ngoc Ta Aidean Sharghi.
Bidirectional LSTM-CRF Models for Sequence Tagging
Week 7 Presentation Ngoc Ta Aidean Sharghi
Neural Machine Translation by Jointly Learning to Align and Translate
Visual Grounding.
Presentation transcript:

Unsupervised Pretraining for Semantic Parsing Jack Koch1 1Department of Applied Mathematics, Yale University, New Haven, CT LILY Lab Introduction Results Semantic parsing is the process of mapping natural language sentences to formal meaning representations. Techniques in semantic parsing can be performed on various natural languages as well as task-specific representations of meaning. As in many areas in machine learning, semantic parsing can be learned in either supervised or semi-supervised settings, in which a natural language query is paired with either its equivalent logical form or its denotation. This project represents an attempt to combine recent advances in natural language processing involving the use of language models that have been pretrained on large amounts of data in an unsupervised manner (unsupervised pretraining) with the well-established use of sequence-to-sequence models on semantic parsing tasks. My results give strong empirical evidence in favor of the effectiveness of utilizing pretrained models for semantic parsing. Although none of my models were able to match the baseline reported by Dong and Lapata (2016), the models with ELMo strongly outperformed my own baseline, which is the more relevant comparison, as differences in implementations and training are controlled for. The ELMo models see absolute gains in performance of 3.8%, 8.6%, and 15.7% on ATIS, GEO, and JOBS, respectively. This supports the trend reported in the literature of pretraining being most important for smaller datasets, as ATIS is the largest dataset, while JOBS is the smallest. These results show that incorporating features that have been pretrained as a language model into a sequence-to-sequence model is a simple way to significantly increase performance; the baseline models were tuned on the validation set to optimize performance, but the ELMo models used exactly the same architecture and training settings with the addition of ELMo features, no further tuning, and noticeably improved performance across the board. Table 1. Sequence-level accuracies of sequence-to-sequence models, with and without ELMo Materials and Methods My baseline is a sequence-to-sequence LSTM network with an attention mechanism. This architecture has already been successfully used in semantic parsing. All experiments utilize the same sequence-to-sequence architecture: a bidirectional encoder with 200-dimensional hidden units and 100-dimensional embeddings, a bilinear attention mechanism, 100-dimensional embeddings for the decoder, and a beam search of 5 for decoding. My sequence-to-sequence with ELMo model is the exact same architecture described above except for one difference: 1024-dimensional pretrained ELMo features are combined with 100-dimensional GLoVe embeddings to provide 1124-dimensional inputs to the LSTM encoder. The decoder maintains its randomly initialized, 100-dimensional embedding layer. Figure 1 shows how forwards and backwards LSTM language models are independently trained and weighted and combined to yield the ELMo representation. Conclusion In this project, I have demonstrated the effectiveness of incorporating pretrained features into sequence transduction models. The use of language models which have been pretrained in an unsupervised manner in order to improve performance on downstream supervised tasks has been an increasingly popular trend in NLP and used to achieve state-of-the-art results on a diverse set of language understanding tasks. This project extends this trend to the realm of language generation by using ELMo to achieve large increases in performance on semantic parsing datasets. Figure 1. A visualization of the ELMo biLM used to calculate pretrained representations. Acknowledgement Many thanks to Dragomir Radev, Alex Fabbri, and the rest of the LILY lab for their support!