Memory Networks for Question Answering

Memory Networks for Question Answering

Question Answering Return the correct answer to a question expressed as a natural language question Different from IR (search), which returns the most relevant documents, to a query expressed with keywords. In principle, most AI/NLP tasks can be reduced to QA

Examples I: Mary walked to the bathroom. I: Sandra went to the garden. I: Daniel went back to the garden. I: Sandra took the milk there. Q: Where is the milk? A: garden I: Everybody is happy. Q: What’s the sentiment? A: positive I: Jane has a baby in London. Q: What are the named entities? A: Jane-person, London-location A: NNP VBZ DT NN IN NNP I: I think this book is fun. Q: What is the French translation? A: Je crois que ce livre est amusant.

Difficulty No single model architecture for NLP is consistently best on all tasks Task Corpus SoA QA bAbI Memory Neural Network (Weston et al. 2015) Sentiment Analysis SST Bi-LSTM (Moschitti et al. 2016) POS PTB-WSJ Bidirectional LSTM (Huang et al. 2015)

Tasks Inference QA Reading Comprehension Story T Question q
Answer q performing inference over T Story T Cloze sentence s Fill the gaps in s using information from T Sentence in which key words have been deleted

bAbI Dataset

The Facebook bAbI dataset
A collection of 20 tasks Each task checks one skill that a reasoning system should have. Aims at systems able to solve all tasks: no task specific engineering.

(T1) Single supporting fact (T2) Two supporting facts
Questions where a single supporting fact, previously given, provides the answer. Simplest case of this: asking for the location of a person. Harder task: questions where two supporting statements have to be chained to answer the question. John is in the playground. Bob is in the office. Where is John? A: playground John is in the playground. Bob is in the office. John picked up the football. Bob went to the kitchen. Where is the football? A: playground Where was Bob before the kitchen? A: office

(T3) Three supporting facts (T4) Two argument relations
Similarly, one can make a task with three supporting facts The first three statements are all required to answer this To answer questions the ability to differentiate and recognize subjects and objects is crucial. We consider the extreme case: sentences feature re-ordered words: John picked up the apple. John went to the office. John went to the kitchen. John dropped the apple. Where was the apple before the kitchen? A:office The office is north of the bedroom. The bedroom is north of the bathroom. What is north of the bedroom? A:office What is the bedroom north of? A:bathroom

(T6) Yes/No questions (T7) Counting
This task tests, in the simplest case possible (with a single supporting fact) the ability of a model to answer true/false type questions: This task tests the ability of the QA system to perform simple counting operations, by asking about the number of objects with a certain property John is in the playground. Daniel picks up the milk. Is John in the classroom? A:no Does Daniel have the milk? A:yes Daniel picked up the football. Daniel dropped the football. Daniel got the milk. Daniel took the apple. How many objects is Daniel holding? A:two

(T17) Positional reasoning (T18) Reasoning about size
This task tests spatial reasoning, one of many components of the classical block world The Yes/No task (6) is a prerequisite. This task requires reasoning about relative size of objects Tasks 3 (three supporting facts) and 6 (Yes/No) are prerequisites The triangle is to the right of the blue square The red square is on top of the blue square The red sphere is to the right of the blue square. Is the red sphere to the right of the blue square? A:yes Is the red square to the left of the triangle? A:yes The football fits in the suitcase. The suitcase fits in the cupboard. The box of chocolates is smaller than the football. Will the box of chocolates fit in the suitcase? A:yes

(T20) Agent’s motivation
(T19) Path finding (T20) Agent’s motivation the goal is to find the path between locations This task is difficult because it effectively involves search The kitchen is north of the hallway. The den is east of the hallway. How do you go from den to kitchen? A: west,north John is hungry. John goes to the kitchen. John grapples the apple there. Daniel is hungry. Why did John go to the kitchen? A: hungry Where does Daniel go?A: kitchen

Difficulty Type Task number Difficulty Single Supporting Fact 1 Easy
Two or Three Supporting Facts 2-3 Hard Two or Three Argument Relations 4-5 Medium Yes/No Questions 6 Counting and Lists/Sets 7-8 Simple Negation and Indefinite Knowledge 9-10 Basic Coreference, Conjunctions and Compound Coreference Time Reasoning 14 Basic Deduction and Induction 15-16 Positional and Size Reasoning 17-18 Path Finding 10 Agent’s Motivations 20

Dynamic Memory Networks

Intuition Imagine having to read an article, memorize it, then get asked questions about it -> Hard! You can't store everything in working memory Optimal: give you the input data, give you the question, allow as many glances as possible

Dynamic Memory Networks
Semantic Memory Module (Embeddings) Use RNNs, specifically GRUs for every module Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, Kumar et. al. ICML 2016

DMN: The Input Module Final GRU Output for 𝑡 𝑡ℎ sentence
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, Kumar et. al. ICML 2016

DMN: Question Module Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, Kumar et. al. ICML 2016

DMN: Episodic Memory, 1st hop
𝐻𝑜𝑝= 𝑖 𝑖 = 1 Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, Kumar et. al. ICML 2016

DMN: Episodic Memory, 2nd hop
𝐻𝑜𝑝= 𝑖 𝑖 =2 Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, Kumar et. al. ICML 2016

Inspiration from Neuroscience
Episodic memory is the memory of autobiographical events (times, places, etc). A collection of past personal experiences that occurred at a particular time and place. The hippocampus, the seat of episodic memory in humans, is active during transitive inference In the DMN repeated passes over the input are needed for transitive inference Slide from Richard Socher

DMN When the end of the input is reached, the relevant facts are summarized in another GRU or simple NNet Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, Kumar et. al. ICML 2016

DMN: Answer Module Ask Me Anything: Dynamic Memory Networks for Natural Language Processing, Kumar et. al. ICML 2016

DMN How many GRUs were used with 2 hops?

Experiments: QA on bAbI (1k)
Task MemNN DMN 1: Single Supporting Fact 100 11: Basic Coreference 99.9 2: Two Supporting Facts 98.2 12: Conjunction 3: Three Supporting Facts 95.2 13: Compound Coreference 99.8 4: Two Argument Relations 14: Time Reasoning 99 5: Three Argument Relations 98 99.3 15: Basic Deduction 6: Yes/No Questions 16: Basic Induction 99.4 7: Counting 85 96.9 17: Positional Reasoning 65 59.6 8: Lists/Sets 91 96.5 18: Size Reasoning 95 95.3 9: Simple Negation 19: Path Finding 36 34.5 10: Indeﬁnite Knowledge 97.5 20: Agent’s Motivations Mean Accuracy (%) 93.3 93.36

Focus during processing a query

Comparison: MemNets vs DMN
Similarities Differences MemNets and DMNs have input, scoring, attention and response mechanisms For input representations MemNets use bag of word, nonlinear or linear embeddings that explicitly encode position MemNets iteratively run functions for attention and response DMNs shows that neural sequence models can be used for input representation, attention and response mechanisms  naturally captures position and temporality Enables broader range of applications

Question Dependent Recurrent Entity Network for Question Answering
Andrea Madotto Giuseppe Attardi

Examples RQA: bAbI RC: CNN news articles Story Story
John picked up the apple John went to the office John went to the kitchen John dropped the apple Story Robert Downey Jr. may be Iron Man in the popular Marvel superhero films, but he recently dealt in some advanced bionic ... Question star Robert Downey Jr presents a young child with a bionic arm Question Where was the apple before the kitchen? Answer office Answer Iron Man

Related Work Many models have been proposed to solve the RQA and RC.
Pointer Networks Memory Networks Attentive Sum Reader Attention Over Attention EpiReader Stanford Attentive Reader Dynamic Entity Representation Memory Network Recurrent Entity Network Dynamic Memory Network Neural Touring Machine End To End Memory Network (MemN2N)

Question Dependent Recurrent Entity Network
Based on the Memory Network framework as a variant of Recurrent Entity Network (Henaff et a., 2017) Input Encoder: creates an internal representation Dynamic Memory: stores relevant information about entities (Person, Location, etc.) Output Module: generates the output Idea: include the question (q) in the memorization process

Input Encoder {e1(s), … , em(s)} vectors for the words in st
Set of sentences {s1 , … , st } and a question q. E  R|V |×d embedding matrix → E(w) = e  Rd {e1(s), … , em(s)} vectors for the words in st {e1(q), … , ek(q)}} vectors for the words in q f (s/q) = { f1 , … , fm } multiplicative masks with fi Rd 𝑠 𝑡 = 𝑖=1 𝑚 𝑒 𝑖 (𝑠) ⨀ 𝑓 𝑖 (𝑠) 𝑞 𝑡 = 𝑖=1 𝑘 𝑒 𝑖 (𝑞) ⨀ 𝑓 𝑖 (𝑞)

Dynamic Memory Consists in a set of blocks meant to represent entities in the story. A block is made by a key ki that identifies the entity and a hidden state hi that stores information about it. 𝑔 𝑖 (𝑡) =𝜎( 𝑠 𝑡 𝑇 ℎ 𝑖 (𝑡−1) + 𝑠 𝑡 𝑇 𝑘 𝑖 (𝑡−1) + 𝑠 𝑡 𝑇 𝑞) ℎ 𝑖 (𝑡) =𝜙(𝑈 ℎ 𝑖 (𝑡−1) +𝑉 𝑘 𝑖 (𝑡−1) +𝑊 𝑠 𝑡 ) ℎ 𝑖 (𝑡) = ℎ 𝑖 (𝑡−1) + 𝑔 𝑖 (𝑡) ⨀ ℎ 𝑖 (𝑡) ℎ 𝑖 (𝑡) = ℎ 𝑖 (𝑡) ℎ 𝑖 (𝑡)

Output Module Training Scores memories hi using q
embedding matrix E R|V|×d to generate the output ŷ 𝑝 𝑖 = softmax ( 𝑞 𝑇 ℎ 𝑖 ) u= 𝑗=1 𝑧 𝑝 𝑖 ℎ 𝑖 𝑦 =𝑅𝜙(𝑞+𝐻𝑢) where ŷ  R|V| represents the model answer Training Given {(xi, yi)}ni=1, we used cross entropy loss function H(y, ŷ). End-To-End training with standard Backpropagation Through Time (BPTT) algorithm.

Overall Architecture Overall picture

Experiments and Results
QDREN implementation1 with TensorFlow Tests on the RQA and RC tasks using: RQA - bAbI 1K RC - CNN news articles 20 tasks each of which has: 900 training 100 validation 1000 test CNN news articles split into: training 3924 validation 3198 test 1 Available at

Error rates on the test set.
RQA - bAbI 1K Failed Tasks (>5%): 20 11 15 8 Mean Error: 65.9 50.8 13.9 29.6 18.6 n-gram LSTM MemN2N REN QDREN Task n-gram LSTM MemN2N REN QDREN 1 64.0 50.0 0.7 11 71.0 28.0 0.9 8 0.6 2 98.0 80.0 8.3 56.4 67.6 12 91.0 26.0 0.2 0.8 3 93.0 40.3 69.7 60.8 13 74.0 6.0 0.4 9 4 39.0 2.8 1.4 14 81.0 73.0 1.7 62.9 15.8 5 30.0 13.1 4.6 15 79.0 57.8 0.3 6 51.0 52.0 7.6 30 29 16 57.0 77.0 1.3 53.2 52 7 48.0 17.3 22.3 17 54.0 49.0 51 46.4 37.4 60.0 55.0 10 19.2 2.5 18 11.1 8.8 10.1 38.0 36.0 13.2 31.5 4.8 19 10.0 92.0 82.8 90.4 85 56.0 1y.1 15.6 3.8 20 24.0 9.0 2.6 Error rates on the test set.

RC - CNN news articles window the popular @entity4 superhero films
Four variants, using either plain text or windows of text as input: REN REN + WIND QDREN QDREN + WIND window the popular @entity4 superhero films .. REN REN+WIND QDREN QDREN+WIND Validation Test 42.0 38.0 39.9 59.1 40.1 39.7 62.8 LSTM Att. Reader MemN2N AoA 55.0 61.6 63.4 73.1 57.0 63.0 66.8 74.4

Analysis: REN gating activation
where is mary? 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 sandra went to the kitchen john journeyed to the kitchen mary travelled to the hallway john moved to the hallway john went back to the office mary went to the kitchen sandra went back to the garden daniel journeyed to the garden mary moved to the bathroom sandra went back to the bathroom

Analysis: QDREN gating activation
where is mary? 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 sandra went to the kitchen john journeyed to the kitchen mary travelled to the hallway john moved to the hallway john went back to the office mary went to the kitchen sandra went back to the garden daniel journeyed to the garden mary moved to the bathroom sandra went back to the bathroom

References Ask Me Anything: Dynamic Memory Networks for Natural Language Processing (Kumar et al., 2015) Dynamic Memory Networks for Visual and Textual Question Answering (Xiong et al., 2016) Sequence to Sequence (Sutskever et al., 2014) Neural Turing Machines (Graves et al., 2014) Teaching Machines to Read and Comprehend (Hermann et al., 2015) Learning to Transduce with Unbounded Memory (Grefenstette 2015) Structured Memory for Neural Turing Machines (Wei Zhang 2015) End to end memory networks (Sukhbaataret et al., 2015) A. Moschitti, A. Severyn UNITN: Training Deep Convolutional Network for Twitter Sentiment Classification. SemEeval 2015. M. Henaff, J. Weston, A. Szlam, A. Bordes, Y. LeCun. Tracking the World State with Recurrent Entity Network. arXiv preprint arXiv: , 2017. J. Weston, S. Chopra, A. Bordes. Memory Networks. arXiv preprint arXiv: , 2014. A. Madotto, G. Attardi. Question Dependent Recurrent Entity Network for Question Answering

Memory Networks for Question Answering

Similar presentations

Presentation on theme: "Memory Networks for Question Answering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Memory Networks for Question Answering

Similar presentations

Presentation on theme: "Memory Networks for Question Answering"— Presentation transcript:

Similar presentations

About project

Feedback