Download presentation
Presentation is loading. Please wait.
1
Teaching Machines to Converse
Jiwei Li Computer Science Department Stanford University
2
Collaborators Bill Dolan Microsoft Research Dan Jurafsky Stanford
Alan Ritter Ohio State University Chris Brockett Microsoft Research Jason Weston Facebook AI Research Alexander Miller Facebook AI Research Sumit Chopra Facebook AI Research Marc'aurelio Ranzato Facebook AI Research Michel Galley Microsoft Research Will Monroe Stanford Jianfeng Gao Microsoft Research
4
Borrowed From Bill MacCartney’s slides
Many people consider siri as a bg breakthrough in AI Borrowed From Bill MacCartney’s slides
5
Does Siri really understand language ?
Many of have seen this Borrowed From Bill MacCartney’s slides
6
Does Siri really understand language ?
Cole-bear Borrowed From Bill MacCartney’s slides
7
Does Siri really understand language ?
Cole-bear Borrowed From Bill MacCartney’s slides
8
Does Siri really understand language ?
Many of have seen this Borrowed From Bill MacCartney’s slides
9
Does Siri really understand language ?
Many of have seen this Borrowed From Bill MacCartney’s slides
10
Does Siri really understand language ?
Many of have seen this Borrowed From Bill MacCartney’s slides
11
Does Siri really understand language ?
Many of have seen this Borrowed From Bill MacCartney’s slides
12
Does Siri really understand language ?
Many of have seen this Borrowed From Bill MacCartney’s slides
13
Does Siri really understand language ?
Many of have seen this Slide Borrowed From Bill MacCartney
14
Slide From Bill MacCartney
How well a machine can talk with humans has been associted with general sucuessfulness of AI for a long time. The attempt to develop a chatbot dates back to the early days of AI. Slide From Bill MacCartney
15
Why is building a chatbot hard ?
Computers need to understand what you ask.
16
Why is building a chatbot hard ?
Computers need to understand what you ask. Computers need to generate coherent, meaningful sequences in response to what you ask,
17
Why is building a chatbot hard ?
Computers need to understand what you ask. Computers need to generate coherent, meaningful sequences in response to what you ask, that require domain knowledge, discourse knowledge, world knowledge
18
Background
19
Background Goal Oriented Tasks
They are expensive and hard to be extended to open domain senarios (Ritter et al., 2010, Sordoni, et al., 2015, Vinyals and Le, 2015)
20
Background Goal Oriented Tasks (Levin et al., 1997;
They are expensive and hard to be extended to open domain senarios (Levin et al., 1997; Young et al., 2013; Walker 2000) (Ritter et al., 2010; Sordoni, et al., 2015; Vinyals and Le, 2015)
21
Outline Mutual Information for Response Generation. (Chitchat)
How to preserve Speaker Consistency (Chitchat) Reinforcement learning for Response Generation (Chitchat) Teaching a bot to ask questions (Goal-oriented)
22
Mutual Information for Response Generation.
23
Seq2Seq Models for Response Generation
(Sutskever et al., 2014; Jean et al., 2014; Luong et al., 2015) Source : Input Messages Target : Responses fine . I’m EOS We can adapt this framework to response generation, in which input messages are sources and output responses are mtargets Encoding Decoding how are you ? eos I’m fine .
24
Seq2Seq Models for Response Generation
how are you ?
25
Seq2Seq Models for Response Generation
Encoding how are you ?
26
Seq2Seq Models for Response Generation
Encoding how are you ?
27
Seq2Seq Models for Response Generation
Encoding how are you ?
28
Seq2Seq Models for Response Generation
Encoding how are you ?
29
Seq2Seq Models for Response Generation
Encoding Decoding how are you ?
30
Seq2Seq Models for Response Generation
I’m Encoding Decoding how are you ? eos
31
Seq2Seq Models for Response Generation
I’m fine Encoding Decoding how are you ? eos I’m
32
Seq2Seq Models for Response Generation
I’m fine . Encoding Decoding how are you ? eos I’m fine
33
Seq2Seq Models for Response Generation
I’m fine . EOS Encoding Decoding how are you ? eos I’m fine .
34
Seq2Seq Models as a Backbone
I’m fine . EOS Encoding Decoding how are you ? eos I’m fine .
35
Mutual Information for Response Generation.
Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015)
36
Mutual Information for Response Generation.
Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) How old are you ? I don’t know .
37
I don’t know what you are talking about.
Mutual Information for Response Generation. Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) How is life ? 30% percent. This behavior is ascribed I don’t know what you are talking about.
38
I don’t know what you are talking about.
Mutual Information for Response Generation. Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) Do you love me ? 30% percent. This behavior is ascribed I don’t know what you are talking about. 30% percent of all generated responses
39
Mutual Information for Response Generation.
Def ChatBot(string): if string[len(string)-1] == “?”: return “I don’t know” else: return “I don’t know what you are talking about” It is not unreasoble to generate . Developing a chatbot is not just about generating reasonable response
40
Mutual Information for Response Generation.
Solution #1: Adding Rules
41
Mutual Information for Response Generation.
Solution #1: Adding Rules I don’t know . I don’t know .. I don’t know … ... I don’t know ! I don’t know ! ! I don’t know ! ! ! Regular expression matching
42
Mutual Information for Response Generation.
Solution #1: Adding Rules I don’t have the foggiest idea what you are talking about . I have no idea . I don’t know . I don’t know .. I don’t know … ... I don’t know ! I don’t know ! ! I don’t know ! ! ! I don’t have a clue. I don’t have the lightest idea what you are talking about . Unfortunately, deep learning model manage to cluster all phrases that are semantically related to I don’t know. Most comprehensive “I don’t know” list I have ever seen … I haven’t the faintest idea How should I know ?
43
Mutual Information for Response Generation.
Solution #1: Adding Rules I don’t have the foggiest idea what you are talking about . I have no idea . I don’t know . I don’t know .. I don’t know … ... I don’t know ! I don’t know ! ! I don’t know ! ! ! I don’t have a clue. I don’t have the lightest idea what you are talking about . Unfortunately, deep learning model manage to cluster all phrases that are semantically related to I don’t know. Most comprehensive “I don’t know” list I have ever seen … I haven’t the faintest idea How should I know ? Rules don’t work !!
44
Mutual Information for Response Generation.
45
Mutual Information for Response Generation.
46
Mutual Information for Response Generation.
“I don’t know” Whatever one asks
47
Mutual Information for Response Generation.
“I don’t know” What one asks
48
Mutual Information for Response Generation.
“I don’t know” What one asks The other way around “I don’t know” What one asks
49
Mutual Information for Response Generation.
50
Mutual Information for Response Generation.
51
Mutual Information for Response Generation.
52
Mutual Information for Response Generation.
Bayesian Rule
53
Mutual Information for Response Generation.
Bayesian Rule Standard Seq2Seq model
54
Mutual Information for Response Generation.
Bayesian Rule
55
Mutual Information for Response Generation.
Bayesian Rule
56
Mutual Information for Response Generation.
Bayesian Rule
57
Datasets and Evaluations
Datasets: Twitter conversational Dataset (23M pairs) Opensubtitle movie scripts dataset (80M pairs)
58
Datasets and Evaluations
Datasets: Twitter conversational Dataset (23M pairs) Opensubtitle movie scripts dataset (80M pairs) Evaluations : BLEU (Papineni et al., 2003) #Distinct tokens Human Evaluation (1000 samples, each output is evaluated by 7 judges) Example
59
Datasets and Evaluations
BLEU +26.4% +51.3% +12.7% +35.0% +22.5% Font too small
60
Datasets and Evaluations
BLEU on Twitter Dataset +21.1% +12.7%
61
Datasets and Evaluations
# Distinct Tokens in generated targets (divided by total #) on Opensubtitle dataset +385% +122%
62
Human Evaluation
63
Human Evaluation
64
Sampled Results Too small Standard Seq2Seq p(t|s) Mutual Information
65
Outlines Mutual Information for Response Generation.
Speaker Consistency Reinforcement learning for Response Generation Teaching a bot to ask questions
66
Speaker Consistency Li et al., A Persona-Based Neural Conversation Model,
67
Speaker Consistency How old are you ? I’m 8 .
68
Speaker Consistency How old are you ? I’m 8 . What’s your age? 18
69
Speaker Consistency Where do you live now? I live in Los Angeles.
70
In which city do you live now?
Speaker Consistency Where do you live now? I live in Los Angeles. In which city do you live now? I live in Paris.
71
Speaker Consistency Where do you live now? I live in Los Angeles.
In which city do you live now? I live in Paris. In which country do you live now? England, you?
72
Speaker Consistency How old are you ? I’m 8.
73
How many kids do you have ?
Speaker Consistency How old are you ? I’m 8. How many kids do you have ? 4, you ?
74
Speaker Consistency When were you born ? In 1942.
75
When was your mother born ?
Speaker Consistency When were you born ? In 1942. When was your mother born ? In 1966.
76
How to represent users Persona embeddings (70k) Bob
77
How to represent users uk london sydney great Word embeddings (50k)
good stay live okay monday tuesday Persona embeddings (70k) Bob
78
Persona seq2seq model Encoding Decoding where do you live EOS
79
Persona seq2seq model Encoding Decoding Bob where do you live EOS Bob
Persona embeddings (70k) Bob
80
Persona seq2seq model Encoding Decoding Bob in where do you live EOS
Persona embeddings (70k) Bob
81
Persona seq2seq model Encoding Decoding Bob Bob in uk where do you
live EOS in Bob Persona embeddings (70k)
82
Persona seq2seq model Encoding Decoding Bob Bob Bob in uk . where do
you live EOS in uk Bob Persona embeddings (70k)
83
Persona seq2seq model Encoding Decoding Bob Bob Bob Bob in uk . EOS
where do you live EOS in uk . If you ask one user 100 questions, the 100 responses you will generate are not independent because the same user representation will be incoproaated Word embeddings (50k) uk london sydney great good stay live okay monday tuesday Persona embeddings (70k) Bob
84
Interaction Seq2Seq model
Encoding where do you live speaker-addressee interaction patterns within the conversation. Add speaker, addresse
85
Interaction Seq2Seq model
Encoding where do you live tanh(W* )
86
Interaction Seq2Seq model
Encoding Decoding where do you live EOS tanh(W* )
87
Interaction Seq2Seq model
uk Encoding Decoding where do you live EOS in tanh(W* )
88
Interaction Seq2Seq model
uk . Encoding Decoding where do you live EOS in uk
89
Datasets and Evaluations
Conversation from Twitter 28M turns 74,003 Users minimum of 60 conversational turns Perplexity BLEU (4,000 single reference) Human Evaluation
90
Quantitative Results Seq2Seq Speaker Model Perplexity 47.2
42.2 (-10.6%) BLEU (without MMI) 0.92 1.12 (+21.7%) BLEU (with MMI) 1.41 1.66 (+11.7%)
91
Human Evaluation Question Pairs
92
Human Evaluation Question Pairs What city do you live in ?
What country do you live in ?
93
Human Evaluation Question Pairs What city do you live in ?
What country do you live in ? Show it !!! Are you vegan or vegetarian ? Do you eat beaf ?
94
Human Evaluation Question Pairs What city do you live in ?
What country do you live in ? London/UK London/US
95
Human Evaluation Which Model produces more consistent answers ?
Each item is given to 5 judges. Ties are discarded Seq2Seq Model Persona Model Item1 +1 Item2
96
Human Evaluation Seq2Seq Model Persona Model 0.84 1.33 (+34.7%)
97
Results (No cherry-picking)
98
Results (No cherry-picking)
99
Results (No cherry-picking)
100
Results (No cherry-picking)
101
Issues How do we handle long-term dialogue success?
102
Outlines Mutual Information for Response Generation.
Speaker Consistency Reinforcement learning for Response Generation Teaching a bot to ask questions
103
Issues How do we handle long-term dialogue success?
Problem 1: Dull and generic responses.
104
I don’t know what you are talking about.
Issues Problem 1: Dull and generic responses. “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) Do you love me ? I don’t know what you are talking about.
105
Issues How do we handle long-term dialogue success?
Problem 1: Dull and generic responses. Problem 2: Repetitive responses.
106
Problem 2: Repetitive responses.
Shut up !
107
Problem 2: Repetitive responses.
Shut up ! No, you shut up !
108
Problem 2: Repetitive responses.
Shut up ! No, you shut up ! No, you shut up !
109
Problem 2: Repetitive responses.
Shut up ! No, you shut up ! No, you shut up ! No, you shut up !
110
…… Problem 2: Repetitive responses. Shut up ! No, you shut up !
111
…… Problem 2: Repetitive responses. See you later ! See you later !
112
Issues How do we handle long-term dialogue success?
Problem 1: Dull and generic responses. Problem 2: Repetitive responses. Problem 3: Short-sighted conversation decisions.
113
Problem 3: Short-sighted conversation decisions.
How old are you ?
114
Problem 3: Short-sighted conversation decisions.
How old are you ? i 'm 16 .
115
Problem 3: Short-sighted conversation decisions.
How old are you ? i 'm 16 . 16 ?
116
Problem 3: Short-sighted conversation decisions.
How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about
117
Problem 3: Short-sighted conversation decisions.
How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about
118
Problem 3: Short-sighted conversation decisions.
How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying
119
Problem 3: Short-sighted conversation decisions.
How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about
120
Problem 3: Short-sighted conversation decisions.
How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying
121
Problem 3: Short-sighted conversation decisions.
Bad Action How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying
122
Problem 3: Short-sighted conversation decisions.
How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying Outcome
123
Can reinforcement learning handle this?
How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying Outcome does not emerge until a few turns later
124
Can reinforcement learning handle this?
125
Notations for Reinforcement Learning
126
Notations: State How old are you ? how old are you Encoding
127
Notations: Action How old are you ? i 'm 16 .
128
Notations: Reward How old are you ? i 'm 16 .
129
Notations: Reward 1. Ease of answering
130
Notations: Reward 1. Ease of answering
131
Notations: Reward 1. Ease of answering
We propose to measure the ease of answering a generated turn by using the negative log likelihood of reponding to that utterance with a dull response S: ”I don’t know what you are talking about”
132
Notations: Reward 2. Information Flow
133
Notations: Reward 2. Information Flow See you later ! See you later !
134
Notations: Reward 2. Information Flow See you later ! S1
135
Notations: Reward 3. Meaningfulness S1 How old are you ? S2 i 'm 16 .
136
Notations: Reward Easy to answer R1 Information Flow R2
Meaningfulness R3
137
A message from training set
Simulation A message from training set
138
A message from training set
Simulation Encode A message from training set
139
A message from training set
Simulation Encode A message from training set Decode r1 …
140
A message from training set
Simulation Encode A message from training set Encode Decode r1 …
141
A message from training set
Simulation Encode A message from training set Encode Decode r1 Decode r2 …
142
… Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode
Sn
143
… Compute Accumulated Reward R(S1,S2,…,Sn) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn Compute Accumulated Reward R(S1,S2,…,Sn)
144
… Easy to answer Turn 1 Turn 2 Turn N Input Message Encode Decode
Sn Easy to answer
145
… Easy to answer R1 Information Flow R2 Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2
146
… Easy to answer R1 Information Flow R2 Meaningfulness R3 Turn 1
Turn N … Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2 Meaningfulness R3
147
… Easy to answer R1 Information Flow R2 Meaningfulness R3 Turn 1
Turn N … Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2 Meaningfulness R3
148
… Easy to answer R1 Information Flow R2 Meaningfulness R3 Turn 1
Turn N … Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2 Meaningfulness R3
149
… Easy to answer R1 Information Flow R2 Meaningfulness R3
Turn 1 Turn 2 Turn N … Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2 Meaningfulness R3 Compute Accumulated Reward R(S1,S2,…,Sn)
150
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
151
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
152
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
153
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
154
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
155
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)
156
… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992) What we want to learn
157
Details 1. Initialize policy using a pre-trained Sequence to Sequence model
158
Details 1. Initialize policy using a pre-trained Sequence to Sequence model 2. Curriculum Learning: gradually increases the number of simulated turns.
159
Evaluation Baselines: Vanilla Seq2seq model
Mutual information Seq2seq model
160
Evaluation 1. Number of Simulated Turns
161
Evaluation 2. Diversity
162
Evaluation Evaluation
163
Evaluation
164
Evaluation
165
Evaluation
166
Results Input Mutual Information The proposed model How old are you ?
I’m 16. I’m 16. why are you asking ? what is your full name ? i have no idea what 's yours ? I don 't want to go home tonight . Really ? Why ? Do you have any feelings for me ? I don’t know what you are talking about. Would I see you if I didn 't ?
167
Results Input Mutual Information The proposed model How old are you ?
I’m 16. I’m 16. why are you asking ? what is your full name ? i have no idea what 's yours ? I don 't want to go home tonight . Really ? Why ? Do you have any feelings for me ? I don’t know what you are talking about. Would I see you if I didn 't ?
168
Results Input Mutual Information The proposed model How old are you ?
I’m 16. I’m 16. why are you asking ? what is your full name ? i have no idea what 's yours ? I don 't want to go home tonight . Really ? Why ? Do you have any feelings for me ? I don’t know what you are talking about. Would I see you if I didn 't ?
169
Results Input Mutual Information The proposed model How old are you ?
I’m 16. I’m 16. why are you asking ? what is your full name ? i have no idea what 's yours ? I don 't want to go home tonight . Really ? Why ? Do you have any feelings for me ? I don’t know what you are talking about. Would I see you if I didn 't ?
170
Simulation How old are you ?
171
Simulation How old are you ? i 'm 16, why are you asking ?
172
Simulation How old are you ? i 'm 16, why are you asking ?
I thought you were 12 .
173
Simulation How old are you ? i 'm 16, why are you asking ?
I thought you were 12 . What made you think so ?
174
I don’t know what you are talking about .
Simulation How old are you ? i 'm 16. why are you asking ? I thought you were 12 . What made you think so ? I don’t know what you are talking about .
175
Simulation How old are you ? i 'm 16, why are you asking ?
I thought you were 12 . What made you think so ? I don’t know what you are talking about . You don’t know what you are saying.
176
Simulation How old are you ? i 'm 16, why are you asking ?
I thought you were 12 . What made you think so ? I don’t know what you are talking about . You don’t know what you are saying. I don’t know what you are talking about .
177
Simulation Survive 4 turns !! How old are you ?
i 'm 16, why are you asking ? I thought you were 12 . What made you think so ? I don’t know what you are talking about . You don’t know what you are saying. I don’t know what you are talking about .
178
Ongoing Work Better automatic evaluation metrics. (BLEU ? Perplexity ?)
179
Ongoing Work Turing Test
Better automatic evaluation metrics. (BLEU ? Perplexity ?) Turing Test Keep an on
180
Future Work Turing Test Generative Adversarial Nets
Better automatic evaluation metrics. (BLEU ? Perplexity ?) Better automatic evaluation metrics. Turing Test Generative Adversarial Nets
181
Outline Mutual Information for Response Generation. (Chitchat)
How to preserve Speaker Consistency (Chitchat) Reinforcement learning for Response Generation (Chitchat) Teaching a bot to ask questions (Goal-oriented)
182
How do you like Hom Tanks?
Introduction How do you like Hom Tanks?
183
How do you like Hom Tansk?
Introduction Case 1 How do you like Hom Tansk? Who is Hom Tanks?
184
How do you like Hom Tanks?
Introduction Case 1 How do you like Hom Tanks? Who is Hom Tanks ? Do you mean Tom Hanks ?
185
Introduction Case 1 How do you like Hom Tanks? Who is Hom Tanks?
Hom Tanks is the leading actor in Forest Gump.
186
Introduction Case 1 How do you like Hom Tanks? Who is Hom Tanks?
Hom Tanks is the leading actor in Forest Gump. Oh. Yeah. I like him a lot.
187
How do you like Hom Tanks?
Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks?
188
Introduction What will Current Chatbot Systems Do ?
How do you like Hom Tanks? How do you like Hom Tanks? UNK
189
How do you like Hom Tanks?
Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like UNK ?
190
How do you like Hom Tanks?
Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like UNK ? Give an output anyway
191
How do you like Hom Tanks?
Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like UNK ? Forward Backward softmax
192
Introduction What will Current Chatbot Systems Do ?
How do you like Hom Tanks? How do you like UNK ? output I hate him. He’s such a jerk. Forward Backward softmax
193
Introduction What will Current Chatbot Systems Do ?
How do you like Hom Tanks? Searching the Web for “how do you like Hom Tanks”
194
MovieQA Domain
195
MovieQA Domain Template
196
MovieQA Domain Template
197
In what scenarios does a bot need to ask questions ?
198
In what scenarios does a bot need to ask questions ?
Case 1: Question Clarification
199
In what scenarios does a bot need to ask questions ?
Case 1: Question Clarification
200
In what scenarios does a bot need to ask questions ?
Case 1: Question Clarification
201
In what scenarios does a bot need to ask questions ?
Case 1: Question Clarification
202
In what scenarios does a bot need to ask questions ?
Case 1: Question Clarification
203
In what scenarios does a bot need to ask questions ?
Case 1: Question Clarification
204
In what scenarios does a bot need to ask questions ?
Case 1: Question Clarification
205
In what scenarios does a bot need to ask questions ?
Case 1: Question Clarification
206
In what scenarios does a bot need to ask questions ?
Case 1: Question Clarification Task 1
207
In what scenarios does a bot need to ask questions ?
Case 1: Question Clarification Task 1 Task 2
208
In what scenarios does a bot need to ask questions ?
Question Templates Case 1: Question Clarification Task 1
209
In what scenarios does a bot need to ask questions ?
Questioning Asking Templates Case 1: Question Clarification Task 1
210
In what scenarios does a bot need to ask questions ?
Case 2: Knowledge Operation.
211
Case 2: Knowledge Operation.
Questioning Asking Templates
212
Case 2: Knowledge Operation.
Task 3 Task 4
213
In what scenarios does a bot need to ask questions ?
Case 3: Knowledge Acquisition .
214
In what scenarios does a bot need to ask questions ?
Case 3: Knowledge Acquisition . Not in the KB … Other questions/ Other answers
215
Settings
216
1. Off-line supervised settings
217
Training Input … Other questions/ Other answers
218
Training Input … Other questions/ Other answers The teacher’s question
219
Training Input … Other questions/ Other answers Dialogue History
220
Training Input … Other questions/ Other answers KB facts
221
Training Input … Other questions/ Other answers Output
222
Input Output … Other questions/ Other answers
End-to-End Memory Networks Output
223
Training Settings
224
Training Settings: 1. Never Asking Questions (TrainQA)
Each training setting corresponds a way to generate a kind of dataset
225
Training Settings: 1. Never Asking Questions (TrainQA)
226
Training Settings: 1. Never Asking Questions (TrainQA)
227
Training Settings: 1. Never Asking Questions (TrainQA)
2. Always Asking Question (TrainAQ)
228
Training Settings: 1. Never Asking Questions (TrainQA)
2. Always Asking Question (TrainAQ)
229
Test Settings: 1. Never Asking Questions (TrainQA)
2. Always Asking Question (TrainAQ)
230
Make Predictions Test Settings: 1. Never Asking Questions (TestQA)
2. Always Asking Question (TestAQ) ????? Make Predictions ?????
231
Task 1-9 TestQA TestAQ TrainQA TrainAQ
232
Results
233
Results Asking questions always helps at test time.
234
Results Asking questions always helps at test time. Only asking questions at training time does not help
235
Results Asking questions always helps at test time. Only asking questions at training time does not help TrainAQ+TrainAQ performs the best
236
Setting2: Reinforcement Learning
Shall I ask a question ???
237
Setting2: Reinforcement Learning
238
Setting2: Reinforcement Learning
Ask a question or not …..
239
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes
240
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes Get Penalized by Cost(AQ)
241
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes
242
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes
243
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes Get Penalized by Cost(AQ)
244
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes Get Penalized by Cost(AQ) +1
245
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes Get Penalized by Cost(AQ) -1
246
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes If No Get Penalized by Cost(AQ)
247
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes If No Get Penalized by Cost(AQ)
248
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes If No +1 Get Penalized by Cost(AQ)
249
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes If No -1 Get Penalized by Cost(AQ)
250
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes If No -1 Get Penalized by Cost(AQ)
251
Setting2: Reinforcement Learning
Ask a question or not ….. If Yes If No -1 Get Penalized by Cost(AQ)
252
Setting2: Reinforcement Learning
Memory Network Ask a question or not ….. Get Penalized by r
253
Setting2: Reinforcement Learning
Memory Network Ask a question or not ….. Get Penalized by r Memory Network Memory Network
254
Setting2: Reinforcement Learning
Memory Network Ask a question or not ….. Get Penalized by r Memory Network Memory Network
255
Policy Gradient Setting2: Reinforcement Learning Memory Network
Ask a question or not ….. Policy Gradient
256
Policy Gradient Setting2: Reinforcement Learning Memory Network
Ask a question or not ….. Policy Gradient
257
Policy Gradient Setting2: Reinforcement Learning Baseline
Memory Network Ask a question or not ….. Policy Gradient Baseline
258
Setting2: Reinforcement Learning
Bad Student
259
Setting2: Reinforcement Learning
260
Setting2: Reinforcement Learning
261
Setting2: Reinforcement Learning
Conclusions: Asking questions helps improve performance.
262
Conclusion We explored multiple strategies to develop better chit-chat style chatbot
263
Conclusion We explored multiple strategies to develop better chit-chat style chatbot (mutual information, speaker consistency, reinforcement learning)
264
Conclusion We explored multiple strategies to develop better chit-chat style chatbot (mutual information, speaker consistency, reinforcement learning) We explored how a bot can interact with users by asking questions to better complete a goal
265
Q&A
266
Mutual Information for Response Generation.
Bayes Rule Anti-language Model
267
Mutual Information for Response Generation.
268
Mutual Information for Response Generation.
Training P(T|S) and P(T) Decoding
269
Mutual Information for Response Generation.
Anti-language Model Training P(T|S) and P(T) Decoding
270
Mutual Information for Response Generation.
Ungrammatical Responses !
271
Mutual Information for Response Generation.
272
Mutual Information for Response Generation.
Solution 1
273
Mutual Information for Response Generation.
Solution 1
274
Mutual Information for Response Generation.
Solution 1
275
Mutual Information for Response Generation.
Solution 1 Only penalize first few words
276
Mutual Information for Response Generation.
Direct Decoding is infeasible
277
Mutual Information for Response Generation.
Direct Decoding is infeasible Train P(T|S), P(S|T)
278
Mutual Information for Response Generation.
Direct Decoding is infeasible Train P(T|S), P(S|T) Generate N-best list using P(T|S)
279
Mutual Information for Response Generation.
Direct Decoding is infeasible Train P(T|S), P(S|T) Generate N-best list using P(T|S) Rerank the N-best list using P(S|T) Font too small
280
Results (No cherry-picking)
Two slides bigger
281
Results (No cherry-picking)
282
Persona seq2seq model Tradeoff Encoding Decoding Bob Bob Bob Bob in uk
. EOS Encoding Decoding Bob Bob Bob Bob where do you live EOS in uk . Tradeoff
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.