End-To-End Memory Networks

Slides:



Advertisements
Similar presentations
Pattern Association.
Advertisements

Dougal Sutherland, 9/25/13.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Distributed Representations of Sentences and Documents
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Chapter 6 Neural Network.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
A Sentence Interaction Network for Modeling Dependence between Sentences Biao Liu, Minlie Huang Tsinghua University.
Attention Model in NLP Jichuan ZENG.
R-NET: Machine Reading Comprehension With Self-Matching Networks
Olivier Siohan David Rybach
Convolutional Sequence to Sequence Learning
Unsupervised Learning of Video Representations using LSTMs
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
RNNs: An example applied to the prediction task
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
The Relationship between Deep Learning and Brain Function
Deep Learning Amin Sobhani.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
CSE 190 Neural Networks: The Neural Turing Machine
Jiani ZHANG Tuesday, October 4, 2016
Recursive Neural Networks
Recurrent Neural Networks for Natural Language Processing
Neural Machine Translation by Jointly Learning to Align and Translate
Attention Is All You Need
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Intelligent Information System Lab
Intro to NLP and Deep Learning
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Hybrid computing using a neural network with dynamic external memory
Image Question Answering
Attention Is All You Need
RNNs: Going Beyond the SRN in Language Prediction
Distributed Representation of Words, Sentences and Paragraphs
Grid Long Short-Term Memory
RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect
Paraphrase Generation Using Deep Learning
A First Look at Music Composition using LSTM Recurrent Neural Networks
Final Presentation: Neural Network Doc Summarization
Understanding LSTM Networks
Word Embedding Word2Vec.
Introduction to Machine Reading Comprehension
Memory-augmented Chinese-Uyghur Neural Machine Translation
Papers 15/08.
Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions
Unsupervised Pretraining for Semantic Parsing
Report by: 陆纪圆.
Learning linguistic structure with simple recurrent neural networks
RNNs: Going Beyond the SRN in Language Prediction
Attention.
实习生汇报 ——北邮 张安迪.
Word embeddings (continued)
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
-- Ray Mooney, Association for Computational Linguistics (ACL) 2014
Presented by: Anurag Paul
Building Dynamic Knowledge Graphs From Text Using Machine Reading Comprehension Rajarshi Das, Tsendsuren Munkhdalai, Xingdi Yuan, Adam Trischler, Andrew.
Ask and Answer Questions
Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.
Neural Machine Translation using CNN
Question Answering System
Baseline Model CSV Files Pandas DataFrame Sentence Lists
Recurrent Neural Networks
Sequence-to-Sequence Models
Week 7 Presentation Ngoc Ta Aidean Sharghi
CSC 578 Neural Networks and Deep Learning
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

End-To-End Memory Networks Presenters: Vicente Ordonez, Paola Cascante

Motivation Two grand challenges in AI: Models that can make multiple computational steps to answer a question or completing a task Models that can describe long term dependencies in sequential data

Synthetic QA Experiments

BaBI dataset

Approach Model takes as input x1,...,xn (to store in memory), query q and outputs answer a

Proposal RNN architecture where the recurrence reads from a possibly large external memory multiple times before outputting a symbol. Continuity of the model: means it can be trained end-to-end from input-output pairs. Uses a global memory, with shared read and write function. Multiple computational steps: “hops”.

Proposal Models that use explicit storage and a notion of attention: Continuous form of Memory Networks RNNSearch with multiple computational steps per output symbol

Synthetic QA Experiments There are a total of 20 different types of tasks that probe different forms of reasoning and deduction. A task is a set of example problems. A problem is a set of I sentences {xi} where I ≤ 320; a question sentence q and answer a. The vocabulary is of size V = 177. Two versions of the data are used, one that has 1000 training problems per task and a second larger one with 10,000 per task.

Model Details K = 3 hops where used Adjacent weight sharing The output embedding for one layer is the input embedding for the one above, i.e. The answer prediction matrix to be the same as the final output embedding, i.e The question embedding is the same as the input embedding of the first layer, i.e.

Model Details Two different sentence representations: A bag-of-words (BoW) representation that takes the sentence embeds each word and sums the resulting vectors: e.g and Position Encoding (PE) that encodes the position of words within the sentence Temporal encoding: Modify the memory vector so that TA(i) is the ith row of a special matrix TA that encodes temporal information

Model Details Learning time invariance by injecting random noise: Add “dummy” memories to regularize TA Random Noise: add 10% of empty memories to the stories

Training Details 10% of the bAbI training set was held-out to form a validation set Learning rate of η = 0.01 Annealing every 25 epochs by η/2 until 100 epochs were reached The weights were initialized randomly from a Gaussian distribution with zero mean and σ = 0.1

Baselines MemNN: The strongly supervised AM+NG+NL Memory Networks approach MemNN-WSH: A weakly supervised heuristic version of MemNN where the supporting sentence labels are not used in training LSTM: A standard LSTM model, trained using question / answer pairs only

Results BoW vs Position Encoding (PE) sentence representation Training on all 20 tasks independently vs jointly training LS vs training with softmaxes from the start Varying memory hops from 1 to 3

Example predictions on the QA tasks

Language Modeling Experiments The perplexity on the test sets of Penn Treebank and Text8 corpora

Language Modeling Experiments Average activation weight of memory positions during 6 memory hops White color indicates where the model is attending during the kth hop

Related Work LSTM-based models through local memory cells. Stack augmented recurrent nets: push and pop operations Neural Turing Machine of Graves which uses both content and address- based access Bidirectional RNN based encoder and gated RNN based decoder were used for machine translation Combinations of n-grams with a cache

Conclusions Recurrent attention mechanism for reading the memory can be successfully trained via backpropagation on diverse tasks. There is no supervision of supporting facts It slightly outperforms tuned RNNs and LSTMs of comparable complexity

Conclusions The model is still unable to match the performance of the memory networks trained with strong supervision Smooth lookups may not scale well to the case where a larger memory is required Extra material: Oral Session: End-To-End Memory Networks https://www.youtube.com/watch?v=ZwvWY9Yy76Q