Unsupervised Pretraining for Semantic Parsing

Slides:

Advertisements

Similar presentations

Semantic Matching of candidates’ profile with job data from Linkedln PRESENTED BY: TING XIAO SARABPREET KAUR DHILLON.

Advertisements

Distributed Representations of Sentences and Documents

Authors : Ramon F. Astudillo, Silvio Amir, Wang Lin, Mario Silva, Isabel Trancoso Learning Word Representations from Scarce Data By: Aadil Hayat (13002)

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Convolutional LSTM Networks for Subcellular Localization of Proteins

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer

Naifan Zhuang, Jun Ye, Kien A. Hua

Brief Intro to Machine Learning CS539

Attention Model in NLP Jichuan ZENG.

Big data classification using neural network

Convolutional Sequence to Sequence Learning

Unsupervised Learning of Video Representations using LSTMs

Deep Learning for Dual-Energy X-Ray

End-To-End Memory Networks

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Deep Learning Amin Sobhani.

Compact Bilinear Pooling

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.

Neural Machine Translation by Jointly Learning to Align and Translate

Attention Is All You Need

A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis

Intro to NLP and Deep Learning

Joint Training for Pivot-based Neural Machine Translation

Intro to NLP and Deep Learning

Natural Language Processing of Knee MRI Reports

Are End-to-end Systems the Ultimate Solutions for NLP?

Efficient Estimation of Word Representation in Vector Space

Convolutional Neural Networks for sentence classification

Grid Long Short-Term Memory

Exploring Matching Networks for Cross Language Information Retrieval

KNOWLEDGE REPRESENTATION

MATERIAL Resources for Cross-Lingual Information Retrieval

Introduction to Machine Reading Comprehension

Outline Background Motivation Proposed Model Experimental Results

Resource Recommendation for AAN

AST Based Sequence to Sequence Natural Language Question to SQL Method

Domain Mixing for Chinese-English Neural Machine Translation

Dialogue State Tracking & Dialogue Corpus Survey

Part of Speech Tagging with Neural Architecture Search

Natural Language to SQL(nl2sql)

Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova

Independent Project Natural Language to SQL

Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.

Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab

Word embeddings (continued)

Sequential Question Answering Using Neural Coreference Resolution

Word Embedding 모든 단어를 vector로 표시 Word vector Word embedding Word

Deep Learning for the Soft Cutoff Problem

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Natural Language Processing (NLP) Systems Joseph E. Gonzalez

Attention for translation

-- Ray Mooney, Association for Computational Linguistics (ACL) 2014

Deep Interest Network for Click-Through Rate Prediction

Sequence to Sequence Video to Text

Automatic Handwriting Generation

Keshav Balasubramanian

Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.

Neural Machine Translation using CNN

Presented By: Harshul Gupta

Baseline Model CSV Files Pandas DataFrame Sentence Lists

Week 3 Presentation Ngoc Ta Aidean Sharghi.

Bidirectional LSTM-CRF Models for Sequence Tagging

Week 7 Presentation Ngoc Ta Aidean Sharghi

Neural Machine Translation by Jointly Learning to Align and Translate

Visual Grounding.

Presentation transcript:

Unsupervised Pretraining for Semantic Parsing Jack Koch1 1Department of Applied Mathematics, Yale University, New Haven, CT LILY Lab Introduction Results Semantic parsing is the process of mapping natural language sentences to formal meaning representations. Techniques in semantic parsing can be performed on various natural languages as well as task-specific representations of meaning. As in many areas in machine learning, semantic parsing can be learned in either supervised or semi-supervised settings, in which a natural language query is paired with either its equivalent logical form or its denotation. This project represents an attempt to combine recent advances in natural language processing involving the use of language models that have been pretrained on large amounts of data in an unsupervised manner (unsupervised pretraining) with the well-established use of sequence-to-sequence models on semantic parsing tasks. My results give strong empirical evidence in favor of the effectiveness of utilizing pretrained models for semantic parsing. Although none of my models were able to match the baseline reported by Dong and Lapata (2016), the models with ELMo strongly outperformed my own baseline, which is the more relevant comparison, as differences in implementations and training are controlled for. The ELMo models see absolute gains in performance of 3.8%, 8.6%, and 15.7% on ATIS, GEO, and JOBS, respectively. This supports the trend reported in the literature of pretraining being most important for smaller datasets, as ATIS is the largest dataset, while JOBS is the smallest. These results show that incorporating features that have been pretrained as a language model into a sequence-to-sequence model is a simple way to significantly increase performance; the baseline models were tuned on the validation set to optimize performance, but the ELMo models used exactly the same architecture and training settings with the addition of ELMo features, no further tuning, and noticeably improved performance across the board. Table 1. Sequence-level accuracies of sequence-to-sequence models, with and without ELMo Materials and Methods My baseline is a sequence-to-sequence LSTM network with an attention mechanism. This architecture has already been successfully used in semantic parsing. All experiments utilize the same sequence-to-sequence architecture: a bidirectional encoder with 200-dimensional hidden units and 100-dimensional embeddings, a bilinear attention mechanism, 100-dimensional embeddings for the decoder, and a beam search of 5 for decoding. My sequence-to-sequence with ELMo model is the exact same architecture described above except for one difference: 1024-dimensional pretrained ELMo features are combined with 100-dimensional GLoVe embeddings to provide 1124-dimensional inputs to the LSTM encoder. The decoder maintains its randomly initialized, 100-dimensional embedding layer. Figure 1 shows how forwards and backwards LSTM language models are independently trained and weighted and combined to yield the ELMo representation. Conclusion In this project, I have demonstrated the effectiveness of incorporating pretrained features into sequence transduction models. The use of language models which have been pretrained in an unsupervised manner in order to improve performance on downstream supervised tasks has been an increasingly popular trend in NLP and used to achieve state-of-the-art results on a diverse set of language understanding tasks. This project extends this trend to the realm of language generation by using ELMo to achieve large increases in performance on semantic parsing datasets. Figure 1. A visualization of the ELMo biLM used to calculate pretrained representations. Acknowledgement Many thanks to Dragomir Radev, Alex Fabbri, and the rest of the LILY lab for their support!