Presentation is loading. Please wait.

Presentation is loading. Please wait.

End-To-End Memory Networks

Similar presentations


Presentation on theme: "End-To-End Memory Networks"— Presentation transcript:

1 End-To-End Memory Networks
Presenters: Vicente Ordonez, Paola Cascante

2 Motivation Two grand challenges in AI:
Models that can make multiple computational steps to answer a question or completing a task Models that can describe long term dependencies in sequential data

3 Synthetic QA Experiments

4

5 BaBI dataset

6 Approach Model takes as input x1,...,xn (to store in memory), query q and outputs answer a

7 Proposal RNN architecture where the recurrence reads from a possibly large external memory multiple times before outputting a symbol. Continuity of the model: means it can be trained end-to-end from input-output pairs. Uses a global memory, with shared read and write function. Multiple computational steps: “hops”.

8 Proposal Models that use explicit storage and a notion of attention:
Continuous form of Memory Networks RNNSearch with multiple computational steps per output symbol

9

10

11 Synthetic QA Experiments
There are a total of 20 different types of tasks that probe different forms of reasoning and deduction. A task is a set of example problems. A problem is a set of I sentences {xi} where I ≤ 320; a question sentence q and answer a. The vocabulary is of size V = 177. Two versions of the data are used, one that has 1000 training problems per task and a second larger one with 10,000 per task.

12 Model Details K = 3 hops where used Adjacent weight sharing
The output embedding for one layer is the input embedding for the one above, i.e. The answer prediction matrix to be the same as the final output embedding, i.e The question embedding is the same as the input embedding of the first layer, i.e.

13 Model Details Two different sentence representations:
A bag-of-words (BoW) representation that takes the sentence embeds each word and sums the resulting vectors: e.g and Position Encoding (PE) that encodes the position of words within the sentence Temporal encoding: Modify the memory vector so that TA(i) is the ith row of a special matrix TA that encodes temporal information

14 Model Details Learning time invariance by injecting random noise:
Add “dummy” memories to regularize TA Random Noise: add 10% of empty memories to the stories

15 Training Details 10% of the bAbI training set was held-out to form a validation set Learning rate of η = 0.01 Annealing every 25 epochs by η/2 until 100 epochs were reached The weights were initialized randomly from a Gaussian distribution with zero mean and σ = 0.1

16 Baselines MemNN: The strongly supervised AM+NG+NL Memory Networks approach MemNN-WSH: A weakly supervised heuristic version of MemNN where the supporting sentence labels are not used in training LSTM: A standard LSTM model, trained using question / answer pairs only

17 Results BoW vs Position Encoding (PE) sentence representation
Training on all 20 tasks independently vs jointly training LS vs training with softmaxes from the start Varying memory hops from 1 to 3

18 Example predictions on the QA tasks

19 Language Modeling Experiments
The perplexity on the test sets of Penn Treebank and Text8 corpora

20 Language Modeling Experiments
Average activation weight of memory positions during 6 memory hops White color indicates where the model is attending during the kth hop

21 Related Work LSTM-based models through local memory cells.
Stack augmented recurrent nets: push and pop operations Neural Turing Machine of Graves which uses both content and address- based access Bidirectional RNN based encoder and gated RNN based decoder were used for machine translation Combinations of n-grams with a cache

22 Conclusions Recurrent attention mechanism for reading the memory can be successfully trained via backpropagation on diverse tasks. There is no supervision of supporting facts It slightly outperforms tuned RNNs and LSTMs of comparable complexity

23 Conclusions The model is still unable to match the performance of the memory networks trained with strong supervision Smooth lookups may not scale well to the case where a larger memory is required Extra material: Oral Session: End-To-End Memory Networks


Download ppt "End-To-End Memory Networks"

Similar presentations


Ads by Google