Teaching Machines to Converse Jiwei Li Computer Science Department Stanford University
Collaborators Bill Dolan Microsoft Research Dan Jurafsky Stanford Alan Ritter Ohio State University Chris Brockett Microsoft Research Jason Weston Facebook AI Research Alexander Miller Facebook AI Research Sumit Chopra Facebook AI Research Marc'aurelio Ranzato Facebook AI Research Michel Galley Microsoft Research Will Monroe Stanford Jianfeng Gao Microsoft Research
Borrowed From Bill MacCartney’s slides Many people consider siri as a bg breakthrough in AI Borrowed From Bill MacCartney’s slides
Does Siri really understand language ? Many of have seen this Borrowed From Bill MacCartney’s slides
Does Siri really understand language ? Cole-bear Borrowed From Bill MacCartney’s slides
Does Siri really understand language ? Cole-bear Borrowed From Bill MacCartney’s slides
Does Siri really understand language ? Many of have seen this Borrowed From Bill MacCartney’s slides
Does Siri really understand language ? Many of have seen this Borrowed From Bill MacCartney’s slides
Does Siri really understand language ? Many of have seen this Borrowed From Bill MacCartney’s slides
Does Siri really understand language ? Many of have seen this Borrowed From Bill MacCartney’s slides
Does Siri really understand language ? Many of have seen this Borrowed From Bill MacCartney’s slides
Does Siri really understand language ? Many of have seen this Slide Borrowed From Bill MacCartney
Slide From Bill MacCartney How well a machine can talk with humans has been associted with general sucuessfulness of AI for a long time. The attempt to develop a chatbot dates back to the early days of AI. Slide From Bill MacCartney
Why is building a chatbot hard ? Computers need to understand what you ask.
Why is building a chatbot hard ? Computers need to understand what you ask. Computers need to generate coherent, meaningful sequences in response to what you ask,
Why is building a chatbot hard ? Computers need to understand what you ask. Computers need to generate coherent, meaningful sequences in response to what you ask, that require domain knowledge, discourse knowledge, world knowledge
Background
Background Goal Oriented Tasks They are expensive and hard to be extended to open domain senarios (Ritter et al., 2010, Sordoni, et al., 2015, Vinyals and Le, 2015)
Background Goal Oriented Tasks (Levin et al., 1997; They are expensive and hard to be extended to open domain senarios (Levin et al., 1997; Young et al., 2013; Walker 2000) (Ritter et al., 2010; Sordoni, et al., 2015; Vinyals and Le, 2015)
Outline Mutual Information for Response Generation. (Chitchat) How to preserve Speaker Consistency (Chitchat) Reinforcement learning for Response Generation (Chitchat) Teaching a bot to ask questions (Goal-oriented)
Mutual Information for Response Generation.
Seq2Seq Models for Response Generation (Sutskever et al., 2014; Jean et al., 2014; Luong et al., 2015) Source : Input Messages Target : Responses fine . I’m EOS We can adapt this framework to response generation, in which input messages are sources and output responses are mtargets Encoding Decoding how are you ? eos I’m fine .
Seq2Seq Models for Response Generation how are you ?
Seq2Seq Models for Response Generation Encoding how are you ?
Seq2Seq Models for Response Generation Encoding how are you ?
Seq2Seq Models for Response Generation Encoding how are you ?
Seq2Seq Models for Response Generation Encoding how are you ?
Seq2Seq Models for Response Generation Encoding Decoding how are you ?
Seq2Seq Models for Response Generation I’m Encoding Decoding how are you ? eos
Seq2Seq Models for Response Generation I’m fine Encoding Decoding how are you ? eos I’m
Seq2Seq Models for Response Generation I’m fine . Encoding Decoding how are you ? eos I’m fine
Seq2Seq Models for Response Generation I’m fine . EOS Encoding Decoding how are you ? eos I’m fine .
Seq2Seq Models as a Backbone I’m fine . EOS Encoding Decoding how are you ? eos I’m fine .
Mutual Information for Response Generation. Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015)
Mutual Information for Response Generation. Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) How old are you ? I don’t know .
I don’t know what you are talking about. Mutual Information for Response Generation. Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) How is life ? 30% percent. This behavior is ascribed I don’t know what you are talking about.
I don’t know what you are talking about. Mutual Information for Response Generation. Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) Do you love me ? 30% percent. This behavior is ascribed I don’t know what you are talking about. 30% percent of all generated responses
Mutual Information for Response Generation. Def ChatBot(string): if string[len(string)-1] == “?”: return “I don’t know” else: return “I don’t know what you are talking about” It is not unreasoble to generate . Developing a chatbot is not just about generating reasonable response
Mutual Information for Response Generation. Solution #1: Adding Rules
Mutual Information for Response Generation. Solution #1: Adding Rules I don’t know . I don’t know .. I don’t know … ... I don’t know ! I don’t know ! ! I don’t know ! ! ! Regular expression matching
Mutual Information for Response Generation. Solution #1: Adding Rules I don’t have the foggiest idea what you are talking about . I have no idea . I don’t know . I don’t know .. I don’t know … ... I don’t know ! I don’t know ! ! I don’t know ! ! ! I don’t have a clue. I don’t have the lightest idea what you are talking about . Unfortunately, deep learning model manage to cluster all phrases that are semantically related to I don’t know. Most comprehensive “I don’t know” list I have ever seen … I haven’t the faintest idea How should I know ?
Mutual Information for Response Generation. Solution #1: Adding Rules I don’t have the foggiest idea what you are talking about . I have no idea . I don’t know . I don’t know .. I don’t know … ... I don’t know ! I don’t know ! ! I don’t know ! ! ! I don’t have a clue. I don’t have the lightest idea what you are talking about . Unfortunately, deep learning model manage to cluster all phrases that are semantically related to I don’t know. Most comprehensive “I don’t know” list I have ever seen … I haven’t the faintest idea How should I know ? Rules don’t work !!
Mutual Information for Response Generation.
Mutual Information for Response Generation.
Mutual Information for Response Generation. “I don’t know” Whatever one asks
Mutual Information for Response Generation. “I don’t know” What one asks
Mutual Information for Response Generation. “I don’t know” What one asks The other way around “I don’t know” What one asks
Mutual Information for Response Generation.
Mutual Information for Response Generation.
Mutual Information for Response Generation.
Mutual Information for Response Generation. Bayesian Rule
Mutual Information for Response Generation. Bayesian Rule Standard Seq2Seq model
Mutual Information for Response Generation. Bayesian Rule
Mutual Information for Response Generation. Bayesian Rule
Mutual Information for Response Generation. Bayesian Rule
Datasets and Evaluations Datasets: Twitter conversational Dataset (23M pairs) Opensubtitle movie scripts dataset (80M pairs)
Datasets and Evaluations Datasets: Twitter conversational Dataset (23M pairs) Opensubtitle movie scripts dataset (80M pairs) Evaluations : BLEU (Papineni et al., 2003) #Distinct tokens Human Evaluation (1000 samples, each output is evaluated by 7 judges) Example
Datasets and Evaluations BLEU +26.4% +51.3% +12.7% +35.0% +22.5% Font too small
Datasets and Evaluations BLEU on Twitter Dataset +21.1% +12.7%
Datasets and Evaluations # Distinct Tokens in generated targets (divided by total #) on Opensubtitle dataset +385% +122%
Human Evaluation
Human Evaluation
Sampled Results Too small Standard Seq2Seq p(t|s) Mutual Information
Outlines Mutual Information for Response Generation. Speaker Consistency Reinforcement learning for Response Generation Teaching a bot to ask questions
Speaker Consistency Li et al., 2016. A Persona-Based Neural Conversation Model,
Speaker Consistency How old are you ? I’m 8 .
Speaker Consistency How old are you ? I’m 8 . What’s your age? 18
Speaker Consistency Where do you live now? I live in Los Angeles.
In which city do you live now? Speaker Consistency Where do you live now? I live in Los Angeles. In which city do you live now? I live in Paris.
Speaker Consistency Where do you live now? I live in Los Angeles. In which city do you live now? I live in Paris. In which country do you live now? England, you?
Speaker Consistency How old are you ? I’m 8.
How many kids do you have ? Speaker Consistency How old are you ? I’m 8. How many kids do you have ? 4, you ?
Speaker Consistency When were you born ? In 1942.
When was your mother born ? Speaker Consistency When were you born ? In 1942. When was your mother born ? In 1966.
How to represent users Persona embeddings (70k) Bob
How to represent users uk london sydney great Word embeddings (50k) good stay live okay monday tuesday Persona embeddings (70k) Bob
Persona seq2seq model Encoding Decoding where do you live EOS
Persona seq2seq model Encoding Decoding Bob where do you live EOS Bob Persona embeddings (70k) Bob
Persona seq2seq model Encoding Decoding Bob in where do you live EOS Persona embeddings (70k) Bob
Persona seq2seq model Encoding Decoding Bob Bob in uk where do you live EOS in Bob Persona embeddings (70k)
Persona seq2seq model Encoding Decoding Bob Bob Bob in uk . where do you live EOS in uk Bob Persona embeddings (70k)
Persona seq2seq model Encoding Decoding Bob Bob Bob Bob in uk . EOS where do you live EOS in uk . If you ask one user 100 questions, the 100 responses you will generate are not independent because the same user representation will be incoproaated Word embeddings (50k) uk london sydney great good stay live okay monday tuesday Persona embeddings (70k) Bob
Interaction Seq2Seq model Encoding where do you live speaker-addressee interaction patterns within the conversation. Add speaker, addresse
Interaction Seq2Seq model Encoding where do you live tanh(W* )
Interaction Seq2Seq model Encoding Decoding where do you live EOS tanh(W* )
Interaction Seq2Seq model uk Encoding Decoding where do you live EOS in tanh(W* )
Interaction Seq2Seq model uk . Encoding Decoding where do you live EOS in uk
Datasets and Evaluations Conversation from Twitter 28M turns 74,003 Users minimum of 60 conversational turns Perplexity BLEU (4,000 single reference) Human Evaluation
Quantitative Results Seq2Seq Speaker Model Perplexity 47.2 42.2 (-10.6%) BLEU (without MMI) 0.92 1.12 (+21.7%) BLEU (with MMI) 1.41 1.66 (+11.7%)
Human Evaluation Question Pairs
Human Evaluation Question Pairs What city do you live in ? What country do you live in ?
Human Evaluation Question Pairs What city do you live in ? What country do you live in ? Show it !!! Are you vegan or vegetarian ? Do you eat beaf ?
Human Evaluation Question Pairs What city do you live in ? What country do you live in ? London/UK London/US
Human Evaluation Which Model produces more consistent answers ? Each item is given to 5 judges. Ties are discarded Seq2Seq Model Persona Model Item1 +1 Item2
Human Evaluation Seq2Seq Model Persona Model 0.84 1.33 (+34.7%)
Results (No cherry-picking)
Results (No cherry-picking)
Results (No cherry-picking)
Results (No cherry-picking)
Issues How do we handle long-term dialogue success?
Outlines Mutual Information for Response Generation. Speaker Consistency Reinforcement learning for Response Generation Teaching a bot to ask questions
Issues How do we handle long-term dialogue success? Problem 1: Dull and generic responses.
I don’t know what you are talking about. Issues Problem 1: Dull and generic responses. “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) Do you love me ? I don’t know what you are talking about.
Issues How do we handle long-term dialogue success? Problem 1: Dull and generic responses. Problem 2: Repetitive responses.
Problem 2: Repetitive responses. Shut up !
Problem 2: Repetitive responses. Shut up ! No, you shut up !
Problem 2: Repetitive responses. Shut up ! No, you shut up ! No, you shut up !
Problem 2: Repetitive responses. Shut up ! No, you shut up ! No, you shut up ! No, you shut up !
…… Problem 2: Repetitive responses. Shut up ! No, you shut up !
…… Problem 2: Repetitive responses. See you later ! See you later !
Issues How do we handle long-term dialogue success? Problem 1: Dull and generic responses. Problem 2: Repetitive responses. Problem 3: Short-sighted conversation decisions.
Problem 3: Short-sighted conversation decisions. How old are you ?
Problem 3: Short-sighted conversation decisions. How old are you ? i 'm 16 .
Problem 3: Short-sighted conversation decisions. How old are you ? i 'm 16 . 16 ?
Problem 3: Short-sighted conversation decisions. How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about
Problem 3: Short-sighted conversation decisions. How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about
Problem 3: Short-sighted conversation decisions. How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying
Problem 3: Short-sighted conversation decisions. How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about
Problem 3: Short-sighted conversation decisions. How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying
Problem 3: Short-sighted conversation decisions. Bad Action How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying
Problem 3: Short-sighted conversation decisions. How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying Outcome
Can reinforcement learning handle this? How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying Outcome does not emerge until a few turns later
Can reinforcement learning handle this?
Notations for Reinforcement Learning
Notations: State How old are you ? how old are you Encoding
Notations: Action How old are you ? i 'm 16 .
Notations: Reward How old are you ? i 'm 16 .
Notations: Reward 1. Ease of answering
Notations: Reward 1. Ease of answering
Notations: Reward 1. Ease of answering We propose to measure the ease of answering a generated turn by using the negative log likelihood of reponding to that utterance with a dull response S: ”I don’t know what you are talking about”
Notations: Reward 2. Information Flow
Notations: Reward 2. Information Flow See you later ! See you later !
Notations: Reward 2. Information Flow See you later ! S1
Notations: Reward 3. Meaningfulness S1 How old are you ? S2 i 'm 16 .
Notations: Reward Easy to answer R1 Information Flow R2 Meaningfulness R3
A message from training set Simulation A message from training set
A message from training set Simulation Encode A message from training set
A message from training set Simulation Encode A message from training set Decode r1 …
A message from training set Simulation Encode A message from training set Encode Decode r1 …
A message from training set Simulation Encode A message from training set Encode Decode r1 Decode r2 …
… Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode Sn
… Compute Accumulated Reward R(S1,S2,…,Sn) Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode Encode Decode S1 S2 Sn Compute Accumulated Reward R(S1,S2,…,Sn)
… Easy to answer Turn 1 Turn 2 Turn N Input Message Encode Decode Sn Easy to answer
… Easy to answer R1 Information Flow R2 Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2
… Easy to answer R1 Information Flow R2 Meaningfulness R3 Turn 1 Turn N … Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2 Meaningfulness R3
… Easy to answer R1 Information Flow R2 Meaningfulness R3 Turn 1 Turn N … Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2 Meaningfulness R3
… Easy to answer R1 Information Flow R2 Meaningfulness R3 Turn 1 Turn N … Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2 Meaningfulness R3
… Easy to answer R1 Information Flow R2 Meaningfulness R3 Turn 1 Turn 2 Turn N … Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2 Meaningfulness R3 Compute Accumulated Reward R(S1,S2,…,Sn)
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992) What we want to learn
Details 1. Initialize policy using a pre-trained Sequence to Sequence model
Details 1. Initialize policy using a pre-trained Sequence to Sequence model 2. Curriculum Learning: gradually increases the number of simulated turns.
Evaluation Baselines: Vanilla Seq2seq model Mutual information Seq2seq model
Evaluation 1. Number of Simulated Turns
Evaluation 2. Diversity
Evaluation Evaluation
Evaluation
Evaluation
Evaluation
Results Input Mutual Information The proposed model How old are you ? I’m 16. I’m 16. why are you asking ? what is your full name ? i have no idea what 's yours ? I don 't want to go home tonight . Really ? Why ? Do you have any feelings for me ? I don’t know what you are talking about. Would I see you if I didn 't ?
Results Input Mutual Information The proposed model How old are you ? I’m 16. I’m 16. why are you asking ? what is your full name ? i have no idea what 's yours ? I don 't want to go home tonight . Really ? Why ? Do you have any feelings for me ? I don’t know what you are talking about. Would I see you if I didn 't ?
Results Input Mutual Information The proposed model How old are you ? I’m 16. I’m 16. why are you asking ? what is your full name ? i have no idea what 's yours ? I don 't want to go home tonight . Really ? Why ? Do you have any feelings for me ? I don’t know what you are talking about. Would I see you if I didn 't ?
Results Input Mutual Information The proposed model How old are you ? I’m 16. I’m 16. why are you asking ? what is your full name ? i have no idea what 's yours ? I don 't want to go home tonight . Really ? Why ? Do you have any feelings for me ? I don’t know what you are talking about. Would I see you if I didn 't ?
Simulation How old are you ?
Simulation How old are you ? i 'm 16, why are you asking ?
Simulation How old are you ? i 'm 16, why are you asking ? I thought you were 12 .
Simulation How old are you ? i 'm 16, why are you asking ? I thought you were 12 . What made you think so ?
I don’t know what you are talking about . Simulation How old are you ? i 'm 16. why are you asking ? I thought you were 12 . What made you think so ? I don’t know what you are talking about .
Simulation How old are you ? i 'm 16, why are you asking ? I thought you were 12 . What made you think so ? I don’t know what you are talking about . You don’t know what you are saying.
Simulation How old are you ? i 'm 16, why are you asking ? I thought you were 12 . What made you think so ? I don’t know what you are talking about . You don’t know what you are saying. I don’t know what you are talking about .
Simulation Survive 4 turns !! How old are you ? i 'm 16, why are you asking ? I thought you were 12 . What made you think so ? I don’t know what you are talking about . You don’t know what you are saying. I don’t know what you are talking about .
Ongoing Work Better automatic evaluation metrics. (BLEU ? Perplexity ?)
Ongoing Work Turing Test Better automatic evaluation metrics. (BLEU ? Perplexity ?) Turing Test Keep an on
Future Work Turing Test Generative Adversarial Nets Better automatic evaluation metrics. (BLEU ? Perplexity ?) Better automatic evaluation metrics. Turing Test Generative Adversarial Nets
Outline Mutual Information for Response Generation. (Chitchat) How to preserve Speaker Consistency (Chitchat) Reinforcement learning for Response Generation (Chitchat) Teaching a bot to ask questions (Goal-oriented)
How do you like Hom Tanks? Introduction How do you like Hom Tanks?
How do you like Hom Tansk? Introduction Case 1 How do you like Hom Tansk? Who is Hom Tanks?
How do you like Hom Tanks? Introduction Case 1 How do you like Hom Tanks? Who is Hom Tanks ? Do you mean Tom Hanks ?
Introduction Case 1 How do you like Hom Tanks? Who is Hom Tanks? Hom Tanks is the leading actor in Forest Gump.
Introduction Case 1 How do you like Hom Tanks? Who is Hom Tanks? Hom Tanks is the leading actor in Forest Gump. Oh. Yeah. I like him a lot.
How do you like Hom Tanks? Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks?
Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like Hom Tanks? UNK
How do you like Hom Tanks? Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like UNK ?
How do you like Hom Tanks? Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like UNK ? Give an output anyway
How do you like Hom Tanks? Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like UNK ? Forward Backward softmax
Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like UNK ? output I hate him. He’s such a jerk. Forward Backward softmax
Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? Searching the Web for “how do you like Hom Tanks”
MovieQA Domain
MovieQA Domain Template
MovieQA Domain Template
In what scenarios does a bot need to ask questions ?
In what scenarios does a bot need to ask questions ? Case 1: Question Clarification
In what scenarios does a bot need to ask questions ? Case 1: Question Clarification
In what scenarios does a bot need to ask questions ? Case 1: Question Clarification
In what scenarios does a bot need to ask questions ? Case 1: Question Clarification
In what scenarios does a bot need to ask questions ? Case 1: Question Clarification
In what scenarios does a bot need to ask questions ? Case 1: Question Clarification
In what scenarios does a bot need to ask questions ? Case 1: Question Clarification
In what scenarios does a bot need to ask questions ? Case 1: Question Clarification
In what scenarios does a bot need to ask questions ? Case 1: Question Clarification Task 1
In what scenarios does a bot need to ask questions ? Case 1: Question Clarification Task 1 Task 2
In what scenarios does a bot need to ask questions ? Question Templates Case 1: Question Clarification Task 1
In what scenarios does a bot need to ask questions ? Questioning Asking Templates Case 1: Question Clarification Task 1
In what scenarios does a bot need to ask questions ? Case 2: Knowledge Operation.
Case 2: Knowledge Operation. Questioning Asking Templates
Case 2: Knowledge Operation. Task 3 Task 4
In what scenarios does a bot need to ask questions ? Case 3: Knowledge Acquisition .
In what scenarios does a bot need to ask questions ? Case 3: Knowledge Acquisition . Not in the KB … Other questions/ Other answers
Settings
1. Off-line supervised settings
Training Input … Other questions/ Other answers
Training Input … Other questions/ Other answers The teacher’s question
Training Input … Other questions/ Other answers Dialogue History
Training Input … Other questions/ Other answers KB facts
Training Input … Other questions/ Other answers Output
Input Output … Other questions/ Other answers End-to-End Memory Networks Output
Training Settings
Training Settings: 1. Never Asking Questions (TrainQA) Each training setting corresponds a way to generate a kind of dataset
Training Settings: 1. Never Asking Questions (TrainQA)
Training Settings: 1. Never Asking Questions (TrainQA)
Training Settings: 1. Never Asking Questions (TrainQA) 2. Always Asking Question (TrainAQ)
Training Settings: 1. Never Asking Questions (TrainQA) 2. Always Asking Question (TrainAQ)
Test Settings: 1. Never Asking Questions (TrainQA) 2. Always Asking Question (TrainAQ)
Make Predictions Test Settings: 1. Never Asking Questions (TestQA) 2. Always Asking Question (TestAQ) ????? Make Predictions ?????
Task 1-9 TestQA TestAQ TrainQA TrainAQ
Results
Results Asking questions always helps at test time.
Results Asking questions always helps at test time. Only asking questions at training time does not help
Results Asking questions always helps at test time. Only asking questions at training time does not help TrainAQ+TrainAQ performs the best
Setting2: Reinforcement Learning Shall I ask a question ???
Setting2: Reinforcement Learning
Setting2: Reinforcement Learning Ask a question or not …..
Setting2: Reinforcement Learning Ask a question or not ….. If Yes
Setting2: Reinforcement Learning Ask a question or not ….. If Yes Get Penalized by Cost(AQ)
Setting2: Reinforcement Learning Ask a question or not ….. If Yes
Setting2: Reinforcement Learning Ask a question or not ….. If Yes
Setting2: Reinforcement Learning Ask a question or not ….. If Yes Get Penalized by Cost(AQ)
Setting2: Reinforcement Learning Ask a question or not ….. If Yes Get Penalized by Cost(AQ) +1
Setting2: Reinforcement Learning Ask a question or not ….. If Yes Get Penalized by Cost(AQ) -1
Setting2: Reinforcement Learning Ask a question or not ….. If Yes If No Get Penalized by Cost(AQ)
Setting2: Reinforcement Learning Ask a question or not ….. If Yes If No Get Penalized by Cost(AQ)
Setting2: Reinforcement Learning Ask a question or not ….. If Yes If No +1 Get Penalized by Cost(AQ)
Setting2: Reinforcement Learning Ask a question or not ….. If Yes If No -1 Get Penalized by Cost(AQ)
Setting2: Reinforcement Learning Ask a question or not ….. If Yes If No -1 Get Penalized by Cost(AQ)
Setting2: Reinforcement Learning Ask a question or not ….. If Yes If No -1 Get Penalized by Cost(AQ)
Setting2: Reinforcement Learning Memory Network Ask a question or not ….. Get Penalized by r
Setting2: Reinforcement Learning Memory Network Ask a question or not ….. Get Penalized by r Memory Network Memory Network
Setting2: Reinforcement Learning Memory Network Ask a question or not ….. Get Penalized by r Memory Network Memory Network
Policy Gradient Setting2: Reinforcement Learning Memory Network Ask a question or not ….. Policy Gradient
Policy Gradient Setting2: Reinforcement Learning Memory Network Ask a question or not ….. Policy Gradient
Policy Gradient Setting2: Reinforcement Learning Baseline Memory Network Ask a question or not ….. Policy Gradient Baseline
Setting2: Reinforcement Learning Bad Student
Setting2: Reinforcement Learning
Setting2: Reinforcement Learning
Setting2: Reinforcement Learning Conclusions: Asking questions helps improve performance.
Conclusion We explored multiple strategies to develop better chit-chat style chatbot
Conclusion We explored multiple strategies to develop better chit-chat style chatbot (mutual information, speaker consistency, reinforcement learning)
Conclusion We explored multiple strategies to develop better chit-chat style chatbot (mutual information, speaker consistency, reinforcement learning) We explored how a bot can interact with users by asking questions to better complete a goal
Q&A
Mutual Information for Response Generation. Bayes Rule Anti-language Model
Mutual Information for Response Generation.
Mutual Information for Response Generation. Training P(T|S) and P(T) Decoding
Mutual Information for Response Generation. Anti-language Model Training P(T|S) and P(T) Decoding
Mutual Information for Response Generation. Ungrammatical Responses !
Mutual Information for Response Generation.
Mutual Information for Response Generation. Solution 1
Mutual Information for Response Generation. Solution 1
Mutual Information for Response Generation. Solution 1
Mutual Information for Response Generation. Solution 1 Only penalize first few words
Mutual Information for Response Generation. Direct Decoding is infeasible
Mutual Information for Response Generation. Direct Decoding is infeasible Train P(T|S), P(S|T)
Mutual Information for Response Generation. Direct Decoding is infeasible Train P(T|S), P(S|T) Generate N-best list using P(T|S)
Mutual Information for Response Generation. Direct Decoding is infeasible Train P(T|S), P(S|T) Generate N-best list using P(T|S) Rerank the N-best list using P(S|T) Font too small
Results (No cherry-picking) Two slides bigger
Results (No cherry-picking)
Persona seq2seq model Tradeoff Encoding Decoding Bob Bob Bob Bob in uk . EOS Encoding Decoding Bob Bob Bob Bob where do you live EOS in uk . Tradeoff