Memory Networks Jason Weston, Sumit Chopra and Antoine Bordes Facebook AI Research.

Slides:



Advertisements
Similar presentations
ConcepTest: Culture 1) Product differentiation
Advertisements

The people Look for some people. Write it down. By the water
TEST-TAKING STRATEGIES FOR THE OHIO ACHIEVEMENT READING ASSESSMENT
A.
Stacks & Their Applications COP Stacks  A stack is a data structure that stores information arranged like a stack.  We have seen stacks before.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
EDUC 200 Final Power Point Presentation Scott Reding Spring 2005.
Phrasal verbs (1) We use phrasal verbs in everyday language rather than formal language. We cannot usually guess the meaning from the combination of words.
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
Section 2.3 Gauss-Jordan Method for General Systems of Equations
CSC1016 Coursework Clarification Derek Mortimer March 2010.
Reinforcement Learning
Second Grade English High Frequency Words
1 The Behavior of Consumers Here we study an elementary view of the economic approach to how consumers go about spending their money.
1 The Behavior of Consumers Here we study an elementary view of the economic approach to how consumers go about spending their money – you could say this.
Verb Tense Consistency Historical Present and Literary Present Regular and Irregular Verbs Perfect Tenses.
Power of the Resurrection November 8. Think About It When was a time you told a true story that other people found hard to believe? Today we look at some.
Theme. Think about the last story you read and ask yourself these questions: 1. What was the story about? 2. Did the main character learn something? 3.
Reported Speech Roll No Presented By:- Class: Ixth “A”
Simple Past Tense
Test Taking Tips How to help yourself with multiple choice and short answer questions for reading selections A. Caldwell.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
 Main Idea/Point-of-View  Specific Detail  Conclusion/Inference  Extrapolation  Vocabulary in Context.
Supporting your child with reading.
The Art of Revision English 8. Quotes from Real Authors re: Revision “I just love to use just. It’s just so natural to just write and write and just let.
Week 7 – Analyzing and Summarizing a Text Dr. Erica Cirillo-McCarthy Assistant Director of Graduate and ADEP Writing.
Do Now: In your journal, write about a memory in your life (good/bad) that has had a major impact on who you are today. What is the memory? How has it.
I am ready to test!________ I am ready to test!________
Sight Words.
Words from the Fry List. set put end dies.
Body Paragraphs: STATE, SUPPORT, and EXPLAIN
Sight words.
Collecting Things Together - Lists 1. We’ve seen that Python can store things in memory and retrieve, using names. Sometime we want to store a bunch of.
Making Predictions. Whether you realize it or not, you are always making guesses about what you will encounter next in a text.
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
Comprehension Analyze Text Goal for Lesson 26: You will: 1. Understand point of view 2. Know the author’s purpose.
The Sixth Period Reading & Listening. Questionnaire (3m) Step 1: In your group, think of four situations among friends. Design four questions accordingly.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
Ms. Carlino’s English Class. For a paragraph to make sense, you need two things! 1. Unity – each sentence supports the main idea 2. Coherence – All sentences.
Memory Networks Presenter: Jiasen Lu Jason Weston, Sumit Chopra and Antoine Bordes.
High Frequency Words August 31 - September 4 around be five help next
A Roadmap towards Machine Intelligence
SUMARIZING INSTRUCTIONS: 1. State the author's most important idea. This is commonly referred to as the Main Idea and can be found in the author's thesis.
HOW TO WRITE A WELL ORGANIZED ESSAY/PARAGRAPH?
Sight Words.
High Frequency Words.
Encounter the Risen Lord April 24. Think About It … What emotions do people experiences when visiting a cemetery? Today we finish the series on navigating.
Module 1 How to learn English
BY: ERIN MENEAR AND ELIZABETH BUMGARNER Reading Assessment-4 th Grade.
Seeing or Recognizing March 31. Remember? How has someone recently surprised you with what he or she did or said? Today we look at how all of Jesus friends.
How Did A Positive Thinker Solve His Problem?. The positive thinker was driving down the road in his car on a wild, stormy night, when he passed by a.
Module 6 Problems Unit 2 If you tell him the truth now, you will show that you are honest. ask for advice give advice.
STUDY SKILLS. Successful study requires you to:  Concentrate while studying  Remain focussed on an assignment until it is completed  Resist distractions.
How To Be a Star How do I write an Exciting Expository Essay? First, consider the PROMPT carefully Do not rewrite or write about quote. Do not write.
Memory Networks for Language Understanding
Created By Sherri Desseau Click to begin TACOMA SCREENING INSTRUMENT FIRST GRADE.
Learning to use a ‘For Loop’ and a ‘Variable’. Learning Objective To use a ‘For’ loop to build shapes within your program Use a variable to detect input.
UNIVERSITY OF MONTENEGRO INSTITUTE OF FOREIGN LANGUAGES.
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
End-To-End Memory Networks
Jiani ZHANG Tuesday, October 4, 2016
Unifying QA, Dialog, VQA and Visual Dialog
Fry Word Test First 300 words in 25 word groups
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
Introduction to Machine Reading Comprehension
The. the of and a to in is you that with.
The of and to in is you that it he for was.
Word embeddings (continued)
Attention for translation
Presentation transcript:

Memory Networks Jason Weston, Sumit Chopra and Antoine Bordes Facebook AI Research

Memory Networks Class of models that combine large memory with learning component that can read and write to it. Most ML has limited memory which is more-or-less all that’s needed for “low level” tasks e.g. object detection. Our motivation: long-term memory is required to read a story (or watch a movie) and then e.g. answer questions about it. We study this by building a simple simulation to generate ``stories’’. We also try on some real QA data.

MCTest comprehension data (Richardson et al.) James the Turtle was always getting in trouble. Sometimes he'd reach into the freezer and empty out all the food. Other times he'd sled on the deck and get a splinter. His aunt Jane tried as hard as she could to keep him out of trouble, but he was sneaky and got into lots of trouble behind her back. One day, James thought he would go into town and see what kind of trouble he could get into. He went to the grocery store and pulled all the pudding off the shelves and ate two jars. Then he walked to the fast food restaurant and ordered 15 bags of fries. He didn't pay, and instead headed home. His aunt was waiting for him in his room. She told James that she loved him, but he would have to start acting like a well-behaved turtle. After about a month, and after getting into lots of trouble, James finally made up his mind to be a better turtle. Q: What did James pull off of the shelves in the grocery store? A) pudding B) fries C) food D) splinters Q: Where did James go after he went to the grocery store? A) his deck B) his freezer C) a fast food restaurant D) his room …

MCTest comprehension data (Richardson et al.) James the Turtle was always getting in trouble. Sometimes he'd reach into the freezer and empty out all the food. Other times he'd sled on the deck and get a splinter. His aunt Jane tried as hard as she could to keep him out of trouble, but he was sneaky and got into lots of trouble behind her back. One day, James thought he would go into town and see what kind of trouble he could get into. He went to the grocery store and pulled all the pudding off the shelves and ate two jars. Then he walked to the fast food restaurant and ordered 15 bags of fries. He didn't pay, and instead headed home. His aunt was waiting for him in his room. She told James that she loved him, but he would have to start acting like a well-behaved turtle. After about a month, and after getting into lots of trouble, James finally made up his mind to be a better turtle. Q: What did James pull off of the shelves in the grocery store? A) pudding B) fries C) food D) splinters Q: Where did James go after he went to the grocery store? A) his deck B) his freezer C) a fast food restaurant D) his room … Problems: … it’s hard for this data to lead us to design good ML models … 1)Not enough data to train on (660 stories total). What ends up happening is that shallow methods perform similarly to fancy ones. 2)If we get something wrong we don’t really understand why: every question potentially involves a different kind of reasoning, our model has to do a lot of different things. Our solution: focus on simpler (toy) subtasks where we can generate data to check what the models we design can and cannot do.

Simulation: Basic Commands go get get from put in/on give to drop look inventory examine Commands only for "gods'' (superusers): create set - change the graph that represents the world

Example QA in bAbI world. Antoine went to the kitchen. Antoine picked up the milk. Antoine travelled to the office. Antoine left the milk there. Antoine went to the bathroom. Where is the milk now? (A: office) Where is Antoine? (A: bathroom) antoine go kitchen antoine get milk antoine go office antoine drop milk antoine go bathroom where is milk ? (A: office) where is antoine ? (A: bathroom) Dataset in simulation command format. Dataset after adding a simple grammar.

Task (1) Factoid QA with Single Supporting Fact (“where is actor”) Our first task consists of questions where a single supporting fact, previously given, provides the answer. We test the simplest case of this, by asking for the location of a person. A small sample of the task is thus: It can be considered the simplest case of some real world QA datasets such as in Fader et al., ‘13. We can control the difficulty by the distance in the past of the supporting fact, and how many other irrelevant facts there are. John is in the playground. Bob is in the office. Where is John? A:playground

Task (1) Factoid QA with Single Supporting Fact (“where is actor”) Our first task consists of questions where a single supporting fact, previously given, provides the answer. We test simplest case of this, by asking for the location of a person. A small sample of the task is thus: We use supporting facts for supervision at training time, but are not known at test time (we call this “strong supervision”) John is in the playground. Bob is in the office. Where is John? A:playground SUPPORTING FACT

(2) Factoid QA with Two Supporting Facts (“where is actor+object”) A harder task is to answer questions where two supporting statements have to be chained to answer the question: John is in the playground. Bob is in the office. John picked up the football. Bob went to the kitchen. Where is the football? A:playground Where was Bob before the kitchen? A:office

(2) Factoid QA with Two Supporting Facts (“where is actor+object”) A harder task is to answer questions where two supporting statements have to be chained to answer the question: John is in the playground. Bob is in the office. John picked up the football. Bob went to the kitchen. Where is the football? A:playground Where was Bob before the kitchen? A:office To answer the first question Where is the football? both John picked up the football and John is in the playground are supporting facts.. SUPPORTING FACT

Large QA: Reverb Dataset in (Fader et al., 13) 14M statements, stored as (subject, relation, object) triples. Triples are REVERB extractions mined from ClueWeb09. Statements cover diverse topics: (milne, authored, winnie-the-pooh) (sheep, be-afraid-of, wolf), etc... Weakly labeled QA pairs and 35M paraphrased questions from WikiAnswers: ``Who wrote the Winnie the Pooh books? ``Who is poohs creator?’’ We’ve also worked on the WebQuestions dataset (not featured in this talk)

Memory Networks MemNNs have four component networks (which may or may not have shared parameters): I: (input feature map) this converts incoming data to the internal feature representation. G: (generalization) this updates memories given new input. O: this produces new output (in feature representation space) given the memories. R: (response) converts output O into a response seen by the outside world.

Related Memory Models RNNSearch (Bahdanau et al.) for Machine Translation Can be seen as a MemNN where memory goes back only one sentence (writes embedding for each word). At prediction time, reads memory and performs a soft max to find best alignment (most useful words). Generating Sequences With RNNs (Graves, ‘13) Also does alignment with previous sentence to generate handwriting (so RNN knows what letter it’s currently on). Neural Turing Machines (Graves et al., 14) [on arxiv just 5 days after MemNNs!] Has read and write operations over memory to perform tasks (e.g. copy, sort, associative recall). 128 memory slots in experiments; content addressing computes a score for each slot  slow for large memory? Earlier work by (Das ‘92), (Schmidhuber et al., 93), DISCERN (Miikkulainen, ‘90) and others...

Our First MemNN Implemention I (input): no conversion, keep original text x. G (generalization) : stores I(x) in next available slot m N. O (output): Loops over all memories k=1 or 2 times: 1 st loop max: finds best match m i with x. 2 nd loop max: finds best match m J with (x, m i ). The output o is represented with (x, m i, m J ). R (response): ranks all words in the dictionary given o and returns best single word. (OR: use a full RNN here)

Some Extensions Some options and extensions: If the input is at the character or word level one could group inputs (i.e. learn to segment the input into chunks) and put each chunk in a memory slot. Use an RNN for module R to give true responses. If the memory is huge (e.g. Wikipedia) we need to organize the memories. Solution: S can hash the memories to store in buckets (topics). Then, G and O don’t operate on all memories. If the memory is full, there could be a way of removing one it thinks is most useless; i.e. it ``forgets’’ somehow. That would require a scoring function of the utility of each memory.. Explorations of the first 3 points can be found in the paper.

Matching function For a given Q, we want a good match to the relevant memory slot(s) containing the answer, e.g.: Match(Where is the football ?, John picked up the football) We use a q T U T Ud embedding model with word embedding features: LHS features: Q:Where Q:is Q:the Q:football Q:? RHS features: D:John D:picked D:up D:the D:football QDMatch:the QDMatch:football (QDMatch:football is a feature to say there’s a Q&A word match, which can help.) The parameters U are trained with a margin ranking loss: supporting facts should score higher than non-supporting facts.

Matching function: 2 nd hop On the 2 nd hop we match question & 1 st hop to new fact: Match( [Where is the football ?, John picked up the football], J John is in the playground) We use the same q T U T Ud embedding model: LHS features: Q:Where Q:is Q:the Q:football Q:? Q2: John Q2:picked Q2:up Q2:the Q2:football RHS features: D:John D:is D:in D:the D:playground QDMatch:the QDMatch:is.. Q2DMatch:John We also need time information for bAbI simulation. We tried adding absolute time differences (between two memories) as a feature: tricky to get to work.

Comparing triples Seems to work better if we compare triples: Match(Q,D,D’) returns < 0 if D is better than D’ returns > 0 if D’ is better than D We can loop through memories, keep best m i at each step. Now the features include relative time features: L.H.S: same as before R.H.S: features(D) DbeforeQ:0-or-1 -features(D’) D’beforeQ:0-or-1 DbeforeD’:0-or-1

Results: QA on Reverb data from (Fader et al.) 14M statements stored in the memNN memory. k=1 loops MemNN, 128-dim embedding. R response simply outputs top scoring statement. Time features are not necessary, hence not used. We also tried adding bag of words (BoW) features.

Fast QA on Reverb data Scoring all 14M candidates in the memory is slow. We consider speedups using hashing in S and O as mentioned earlier: Hashing via words (essentially: inverted index) Hashing via k-means in embedding space (k=1000)

bAbI Experiment 1 10k sentences. (Actor: only ask questions about actors.) Difficulty: how many sentences in the past when entity mentioned. Fully supervised (supporting sentences are labeled). Compare RNN (no supervision) and MemNN hops k = 1 or 2, & with/without time features. Difficulty 5 -- Max mem. sz. required: 65 Average mem. sz. required: 9

bAbI Experiment 1 Example test story + predictions: Antoine went to the kitchen. Antoine got the milk. Antoine travelled to the office. Antoine dropped the milk. Sumit picked up the football. Antoine went to the bathroom. Sumit moved to the kitchen. where is the milk now? A: office where is the football? A: kitchen where is Antoine ? A: bathroom where is Sumit ? A: kitchen where was Antoine before the bathroom? A: office

Unsegmented setup; R module is an RNN Joe went to the garden then Fred picked up the milk; Joe moved to the bathroom and Fred dropped the milk, and then Dan moved to the living room. Where is Dan? A: living room I believe Where is Joe? A: the bathroom Fred moved to the bedroom and Joe went to the kitchen then Joe took the milk there and Dan journeyed to the bedroom; Joe discarded the milk. Where is the milk now? A: the milk is in the kitchen Where is Dan now? A: I think he is in the bedroom Joe took the milk there, after that Mike travelled to the office, then Joe went to the living room, next Dan went back to the kitchen and Joe travelled to the office. Where is Joe now? A: I think Joe is in the office

Dealing with new words Sometimes words come along you’ve never seen before, e.g. a new name: `Here,' said Elrond, turning to Gandalf, `is Boromir, a man from the South. He arrived in the grey morning, and seeks for counsel” Our approach: incorporate missing word prediction in same model. Boromir is represented as unknown with left and right context words, features shown in bold: Then Boromir goes to the office  D:UNK-L:Then D:UNK-R:goes D:UNK-R:to D:goes D:to D:the D:office We can learn this with the original training data using a kind of “dropout”: sometimes we pretend we don’t have a word’s embedding, and represent it as above instead.

A MemNN that has never seen Frodo, Bilbo or the ring before.. The “story” told to the model after training: “Bilbo travelled to the cave. Gollum dropped the ring there. Bilbo took the ring. Bilbo went back to the Shire. Bilbo left the ring there. Frodo got the ring. Frodo journeyed to Mount-Doom. Frodo dropped the ring there. Sauron died. Frodo went back to the Shire. Bilbo travelled to the Grey-havens. The End.” MemNN’s answers to some questions: Where is the ring? A: Mount-Doom Where is Bilbo now? A: Grey-havens Where is Frodo now? A: Shire

A MemNN multitasked on bAbI data and Reverb QA data The “story” told to the model after training: Antoine went to the kitchen. Antoine picked up the milk. Antoine travelled to the office. MemNN’s answers to some questions: Where is the milk? A: office Where was Antoine before the office? A: kitchen Where does milk come from? A: milk come from cow What is a cow a type of? A: cow be female of cattle Where are cattle found? A: cattle farm become widespread in brazil What does milk taste like? A: milk taste like milk What does milk go well with? A: milk go with coffee

What’s next? Make it harder/add more stuff, e.g. “he went..”, “Frodo and Sam”, etc.!!! MemNNs that reason with more than 2 supporting memories. Weakly supervised? Ask questions? Say statements? Perform actions? MCTest reading comprehension data (Richardson et al.) Do MemNN ideas extend to other ML tasks and model variants…? [A: probably, yes!].

What’s next? Make it harder/add more stuff, e.g. “he went..”, “Frodo and Sam”, etc.!!! MemNNs that reason with more than 2 supporting memories. Weakly supervised? Ask questions? Say statements? Perform actions? MCTest comprehension data (Richardson et al.) Do MemNN ideas extend to other ML tasks and model variants…? [A: probably, yes!]. J. Weston, A. Bordes, S. Chopra, T. Mikolov. Towards AI- Complete Question Answering: A Set of Prerequisite Toy Tasks. arXiv:

What else is next? Make it harder/add more stuff, e.g. “he went..”, “Frodo and Sam”, etc.!!! MemNNs that reason with more than 2 supporting memories. Weakly supervised? Ask questions? Say statements? Perform actions? MCTest comprehension data (Richardson et al.) Do MemNN ideas extend to other ML tasks and model variants…? [A: probably, yes!]. S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus. Weakly Supervised Memory Networks. arXiv:

What’s next? Make it harder/add more stuff, e.g. “he went..”, “Frodo and Sam”, etc.!!! MemNNs that reason with more than 2 supporting memories. Weakly supervised? Ask questions? Say statements? Perform actions? MCTest reading comprehension data (Richardson et al.) Do MemNN ideas extend to other ML tasks and model variants…? [A: probably, yes!].

Thanks!