End-To-End Memory Networks

Slides:

Advertisements

Similar presentations

Pattern Association.

Advertisements

Dougal Sutherland, 9/25/13.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Distributed Representations of Sentences and Documents

Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.

Chapter 6 Neural Network.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

A Sentence Interaction Network for Modeling Dependence between Sentences Biao Liu, Minlie Huang Tsinghua University.

Attention Model in NLP Jichuan ZENG.

R-NET: Machine Reading Comprehension With Self-Matching Networks

Olivier Siohan David Rybach

Convolutional Sequence to Sequence Learning

Unsupervised Learning of Video Representations using LSTMs

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

RNNs: An example applied to the prediction task

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

The Relationship between Deep Learning and Brain Function

Deep Learning Amin Sobhani.

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

CSE 190 Neural Networks: The Neural Turing Machine

Jiani ZHANG Tuesday, October 4, 2016

Recursive Neural Networks

Recurrent Neural Networks for Natural Language Processing

Neural Machine Translation by Jointly Learning to Align and Translate

Attention Is All You Need

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

ICS 491 Big Data Analytics Fall 2017 Deep Learning

Intelligent Information System Lab

Intro to NLP and Deep Learning

Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.

Hybrid computing using a neural network with dynamic external memory

Image Question Answering

Attention Is All You Need

RNNs: Going Beyond the SRN in Language Prediction

Distributed Representation of Words, Sentences and Paragraphs

Grid Long Short-Term Memory

RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect

Paraphrase Generation Using Deep Learning

A First Look at Music Composition using LSTM Recurrent Neural Networks

Final Presentation: Neural Network Doc Summarization

Understanding LSTM Networks

Word Embedding Word2Vec.

Introduction to Machine Reading Comprehension

Memory-augmented Chinese-Uyghur Neural Machine Translation

Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions

Unsupervised Pretraining for Semantic Parsing

Report by: 陆纪圆.

Learning linguistic structure with simple recurrent neural networks

RNNs: Going Beyond the SRN in Language Prediction

实习生汇报 ——北邮张安迪.

Word embeddings (continued)

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

-- Ray Mooney, Association for Computational Linguistics (ACL) 2014

Presented by: Anurag Paul

Building Dynamic Knowledge Graphs From Text Using Machine Reading Comprehension Rajarshi Das, Tsendsuren Munkhdalai, Xingdi Yuan, Adam Trischler, Andrew.

Ask and Answer Questions

Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.

Neural Machine Translation using CNN

Question Answering System

Baseline Model CSV Files Pandas DataFrame Sentence Lists

Recurrent Neural Networks

Sequence-to-Sequence Models

Week 7 Presentation Ngoc Ta Aidean Sharghi

CSC 578 Neural Networks and Deep Learning

Neural Machine Translation by Jointly Learning to Align and Translate

Presentation transcript:

End-To-End Memory Networks Presenters: Vicente Ordonez, Paola Cascante

Motivation Two grand challenges in AI: Models that can make multiple computational steps to answer a question or completing a task Models that can describe long term dependencies in sequential data

Synthetic QA Experiments

BaBI dataset

Approach Model takes as input x1,...,xn (to store in memory), query q and outputs answer a

Proposal RNN architecture where the recurrence reads from a possibly large external memory multiple times before outputting a symbol. Continuity of the model: means it can be trained end-to-end from input-output pairs. Uses a global memory, with shared read and write function. Multiple computational steps: “hops”.

Proposal Models that use explicit storage and a notion of attention: Continuous form of Memory Networks RNNSearch with multiple computational steps per output symbol

Synthetic QA Experiments There are a total of 20 different types of tasks that probe different forms of reasoning and deduction. A task is a set of example problems. A problem is a set of I sentences {xi} where I ≤ 320; a question sentence q and answer a. The vocabulary is of size V = 177. Two versions of the data are used, one that has 1000 training problems per task and a second larger one with 10,000 per task.

Model Details K = 3 hops where used Adjacent weight sharing The output embedding for one layer is the input embedding for the one above, i.e. The answer prediction matrix to be the same as the ﬁnal output embedding, i.e The question embedding is the same as the input embedding of the ﬁrst layer, i.e.

Model Details Two different sentence representations: A bag-of-words (BoW) representation that takes the sentence embeds each word and sums the resulting vectors: e.g and Position Encoding (PE) that encodes the position of words within the sentence Temporal encoding: Modify the memory vector so that TA(i) is the ith row of a special matrix TA that encodes temporal information

Model Details Learning time invariance by injecting random noise: Add “dummy” memories to regularize TA Random Noise: add 10% of empty memories to the stories

Training Details 10% of the bAbI training set was held-out to form a validation set Learning rate of η = 0.01 Annealing every 25 epochs by η/2 until 100 epochs were reached The weights were initialized randomly from a Gaussian distribution with zero mean and σ = 0.1

Baselines MemNN: The strongly supervised AM+NG+NL Memory Networks approach MemNN-WSH: A weakly supervised heuristic version of MemNN where the supporting sentence labels are not used in training LSTM: A standard LSTM model, trained using question / answer pairs only

Results BoW vs Position Encoding (PE) sentence representation Training on all 20 tasks independently vs jointly training LS vs training with softmaxes from the start Varying memory hops from 1 to 3

Example predictions on the QA tasks

Language Modeling Experiments The perplexity on the test sets of Penn Treebank and Text8 corpora

Language Modeling Experiments Average activation weight of memory positions during 6 memory hops White color indicates where the model is attending during the kth hop

Related Work LSTM-based models through local memory cells. Stack augmented recurrent nets: push and pop operations Neural Turing Machine of Graves which uses both content and address- based access Bidirectional RNN based encoder and gated RNN based decoder were used for machine translation Combinations of n-grams with a cache

Conclusions Recurrent attention mechanism for reading the memory can be successfully trained via backpropagation on diverse tasks. There is no supervision of supporting facts It slightly outperforms tuned RNNs and LSTMs of comparable complexity

Conclusions The model is still unable to match the performance of the memory networks trained with strong supervision Smooth lookups may not scale well to the case where a larger memory is required Extra material: Oral Session: End-To-End Memory Networks https://www.youtube.com/watch?v=ZwvWY9Yy76Q