Teaching Machines to Converse

Teaching Machines to Converse
Jiwei Li Computer Science Department Stanford University

Collaborators Bill Dolan Microsoft Research Dan Jurafsky Stanford
Alan Ritter Ohio State University Chris Brockett Microsoft Research Jason Weston Facebook AI Research Alexander Miller Facebook AI Research Sumit Chopra Facebook AI Research Marc'aurelio Ranzato Facebook AI Research Michel Galley Microsoft Research Will Monroe Stanford Jianfeng Gao Microsoft Research

Borrowed From Bill MacCartney’s slides
Many people consider siri as a bg breakthrough in AI Borrowed From Bill MacCartney’s slides

Does Siri really understand language ?
Many of have seen this Borrowed From Bill MacCartney’s slides

Cole-bear Borrowed From Bill MacCartney’s slides

Many of have seen this Borrowed From Bill MacCartney’s slides

Many of have seen this Slide Borrowed From Bill MacCartney

Slide From Bill MacCartney
How well a machine can talk with humans has been associted with general sucuessfulness of AI for a long time. The attempt to develop a chatbot dates back to the early days of AI. Slide From Bill MacCartney

Why is building a chatbot hard ?
Computers need to understand what you ask.

Computers need to understand what you ask. Computers need to generate coherent, meaningful sequences in response to what you ask,

Computers need to understand what you ask. Computers need to generate coherent, meaningful sequences in response to what you ask, that require domain knowledge, discourse knowledge, world knowledge

Background

Background Goal Oriented Tasks
They are expensive and hard to be extended to open domain senarios (Ritter et al., 2010, Sordoni, et al., 2015, Vinyals and Le, 2015)

Background Goal Oriented Tasks (Levin et al., 1997;
They are expensive and hard to be extended to open domain senarios (Levin et al., 1997; Young et al., 2013; Walker 2000) (Ritter et al., 2010; Sordoni, et al., 2015; Vinyals and Le, 2015)

Outline Mutual Information for Response Generation. (Chitchat)
How to preserve Speaker Consistency (Chitchat) Reinforcement learning for Response Generation (Chitchat) Teaching a bot to ask questions (Goal-oriented)

Mutual Information for Response Generation.

Seq2Seq Models for Response Generation
(Sutskever et al., 2014; Jean et al., 2014; Luong et al., 2015) Source : Input Messages Target : Responses fine . I’m EOS We can adapt this framework to response generation, in which input messages are sources and output responses are mtargets Encoding Decoding how are you ? eos I’m fine .

how are you ?

Encoding how are you ?

Encoding Decoding how are you ?

I’m Encoding Decoding how are you ? eos

I’m fine Encoding Decoding how are you ? eos I’m

I’m fine . Encoding Decoding how are you ? eos I’m fine

I’m fine . EOS Encoding Decoding how are you ? eos I’m fine .

Seq2Seq Models as a Backbone
I’m fine . EOS Encoding Decoding how are you ? eos I’m fine .

Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015)

Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) How old are you ? I don’t know .

I don’t know what you are talking about.
Mutual Information for Response Generation. Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) How is life ? 30% percent. This behavior is ascribed I don’t know what you are talking about.

Mutual Information for Response Generation. Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models (to appear, NAACL,2016) “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) Do you love me ? 30% percent. This behavior is ascribed I don’t know what you are talking about. 30% percent of all generated responses

Def ChatBot(string): if string[len(string)-1] == “?”: return “I don’t know” else: return “I don’t know what you are talking about” It is not unreasoble to generate . Developing a chatbot is not just about generating reasonable response

Solution #1: Adding Rules

Solution #1: Adding Rules I don’t know . I don’t know .. I don’t know … ... I don’t know ! I don’t know ! ! I don’t know ! ! ! Regular expression matching

Solution #1: Adding Rules I don’t have the foggiest idea what you are talking about . I have no idea . I don’t know . I don’t know .. I don’t know … ... I don’t know ! I don’t know ! ! I don’t know ! ! ! I don’t have a clue. I don’t have the lightest idea what you are talking about . Unfortunately, deep learning model manage to cluster all phrases that are semantically related to I don’t know. Most comprehensive “I don’t know” list I have ever seen … I haven’t the faintest idea How should I know ?

Solution #1: Adding Rules I don’t have the foggiest idea what you are talking about . I have no idea . I don’t know . I don’t know .. I don’t know … ... I don’t know ! I don’t know ! ! I don’t know ! ! ! I don’t have a clue. I don’t have the lightest idea what you are talking about . Unfortunately, deep learning model manage to cluster all phrases that are semantically related to I don’t know. Most comprehensive “I don’t know” list I have ever seen … I haven’t the faintest idea How should I know ? Rules don’t work !!

“I don’t know” Whatever one asks

“I don’t know” What one asks

“I don’t know” What one asks The other way around “I don’t know” What one asks

Bayesian Rule

Bayesian Rule Standard Seq2Seq model

Bayesian Rule

Datasets and Evaluations
Datasets: Twitter conversational Dataset (23M pairs) Opensubtitle movie scripts dataset (80M pairs)

Datasets: Twitter conversational Dataset (23M pairs) Opensubtitle movie scripts dataset (80M pairs) Evaluations : BLEU (Papineni et al., 2003) #Distinct tokens Human Evaluation (1000 samples, each output is evaluated by 7 judges) Example

BLEU +26.4% +51.3% +12.7% +35.0% +22.5% Font too small

BLEU on Twitter Dataset +21.1% +12.7%

# Distinct Tokens in generated targets (divided by total #) on Opensubtitle dataset +385% +122%

Human Evaluation

Sampled Results Too small Standard Seq2Seq p(t|s) Mutual Information

Outlines Mutual Information for Response Generation.
Speaker Consistency Reinforcement learning for Response Generation Teaching a bot to ask questions

Speaker Consistency Li et al., A Persona-Based Neural Conversation Model,

Speaker Consistency How old are you ? I’m 8 .

Speaker Consistency How old are you ? I’m 8 . What’s your age? 18

Speaker Consistency Where do you live now? I live in Los Angeles.

In which city do you live now?
Speaker Consistency Where do you live now? I live in Los Angeles. In which city do you live now? I live in Paris.

Speaker Consistency Where do you live now? I live in Los Angeles.
In which city do you live now? I live in Paris. In which country do you live now? England, you?

Speaker Consistency How old are you ? I’m 8.

How many kids do you have ?
Speaker Consistency How old are you ? I’m 8. How many kids do you have ? 4, you ?

Speaker Consistency When were you born ? In 1942.

When was your mother born ?
Speaker Consistency When were you born ? In 1942. When was your mother born ? In 1966.

How to represent users Persona embeddings (70k) Bob

How to represent users uk london sydney great Word embeddings (50k)
good stay live okay monday tuesday Persona embeddings (70k) Bob

Persona seq2seq model Encoding Decoding where do you live EOS

Persona seq2seq model Encoding Decoding Bob where do you live EOS Bob
Persona embeddings (70k) Bob

Persona seq2seq model Encoding Decoding Bob in where do you live EOS
Persona embeddings (70k) Bob

Persona seq2seq model Encoding Decoding Bob Bob in uk where do you
live EOS in Bob Persona embeddings (70k)

Persona seq2seq model Encoding Decoding Bob Bob Bob in uk . where do
you live EOS in uk Bob Persona embeddings (70k)

Persona seq2seq model Encoding Decoding Bob Bob Bob Bob in uk . EOS
where do you live EOS in uk . If you ask one user 100 questions, the 100 responses you will generate are not independent because the same user representation will be incoproaated Word embeddings (50k) uk london sydney great good stay live okay monday tuesday Persona embeddings (70k) Bob

Interaction Seq2Seq model
Encoding where do you live speaker-addressee interaction patterns within the conversation. Add speaker, addresse

Encoding where do you live tanh(W* )

Encoding Decoding where do you live EOS tanh(W* )

uk Encoding Decoding where do you live EOS in tanh(W* )

uk . Encoding Decoding where do you live EOS in uk

Conversation from Twitter 28M turns 74,003 Users minimum of 60 conversational turns Perplexity BLEU (4,000 single reference) Human Evaluation

Quantitative Results Seq2Seq Speaker Model Perplexity 47.2
42.2 (-10.6%) BLEU (without MMI) 0.92 1.12 (+21.7%) BLEU (with MMI) 1.41 1.66 (+11.7%)

Human Evaluation Question Pairs

Human Evaluation Question Pairs What city do you live in ?
What country do you live in ?

What country do you live in ? Show it !!! Are you vegan or vegetarian ? Do you eat beaf ?

What country do you live in ? London/UK London/US

Human Evaluation Which Model produces more consistent answers ?
Each item is given to 5 judges. Ties are discarded Seq2Seq Model Persona Model Item1 +1 Item2

Human Evaluation Seq2Seq Model Persona Model 0.84 1.33 (+34.7%)

Results (No cherry-picking)

Issues How do we handle long-term dialogue success?

Outlines Mutual Information for Response Generation.
Speaker Consistency Reinforcement learning for Response Generation Teaching a bot to ask questions

Issues How do we handle long-term dialogue success?
Problem 1: Dull and generic responses.

Issues Problem 1: Dull and generic responses. “I don’t know“ problem (Sordoni et al., 2015; Serban et al.,2015; ) Do you love me ? I don’t know what you are talking about.

Problem 1: Dull and generic responses. Problem 2: Repetitive responses.

Problem 2: Repetitive responses.
Shut up !

Shut up ! No, you shut up !

Shut up ! No, you shut up ! No, you shut up !

Shut up ! No, you shut up ! No, you shut up ! No, you shut up !

…… Problem 2: Repetitive responses. Shut up ! No, you shut up !

…… Problem 2: Repetitive responses. See you later ! See you later !

Problem 1: Dull and generic responses. Problem 2: Repetitive responses. Problem 3: Short-sighted conversation decisions.

Problem 3: Short-sighted conversation decisions.
How old are you ?

How old are you ? i 'm 16 .

How old are you ? i 'm 16 . 16 ?

How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about

How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying

How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about

How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying

Bad Action How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying

How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying Outcome

Can reinforcement learning handle this?
How old are you ? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about you don 't know what you 're saying Outcome does not emerge until a few turns later

Can reinforcement learning handle this?

Notations for Reinforcement Learning

Notations: State How old are you ? how old are you Encoding

Notations: Action How old are you ? i 'm 16 .

Notations: Reward How old are you ? i 'm 16 .

Notations: Reward 1. Ease of answering

Notations: Reward 1. Ease of answering
We propose to measure the ease of answering a generated turn by using the negative log likelihood of reponding to that utterance with a dull response S: ”I don’t know what you are talking about”

Notations: Reward 2. Information Flow

Notations: Reward 2. Information Flow See you later ! See you later !

Notations: Reward 2. Information Flow See you later ! S1

Notations: Reward 3. Meaningfulness S1 How old are you ? S2 i 'm 16 .

Notations: Reward Easy to answer R1 Information Flow R2
Meaningfulness R3

A message from training set
Simulation A message from training set

Simulation Encode A message from training set

Simulation Encode A message from training set Decode r1 …

Simulation Encode A message from training set Encode Decode r1 …

Simulation Encode A message from training set Encode Decode r1 Decode r2 …

… Turn 1 Turn 2 Turn N Input Message Encode Decode Encode Decode
Sn

… Compute Accumulated Reward R(S1,S2,…,Sn) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn Compute Accumulated Reward R(S1,S2,…,Sn)

… Easy to answer Turn 1 Turn 2 Turn N Input Message Encode Decode
Sn Easy to answer

… Easy to answer R1 Information Flow R2 Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2

… Easy to answer R1 Information Flow R2 Meaningfulness R3 Turn 1
Turn N … Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2 Meaningfulness R3

… Easy to answer R1 Information Flow R2 Meaningfulness R3
Turn 1 Turn 2 Turn N … Input Message Encode Decode Encode Decode Encode Decode S1 S2 S3 Easy to answer R1 Information Flow R2 Meaningfulness R3 Compute Accumulated Reward R(S1,S2,…,Sn)

… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992)

… REINFORCE Algorithm (William,1992) Turn 1 Turn 2 Turn N Input
Message Encode Decode Encode Decode Encode Decode S1 S2 Sn REINFORCE Algorithm (William,1992) What we want to learn

Details 1. Initialize policy using a pre-trained Sequence to Sequence model

Details 1. Initialize policy using a pre-trained Sequence to Sequence model 2. Curriculum Learning: gradually increases the number of simulated turns.

Evaluation Baselines: Vanilla Seq2seq model
Mutual information Seq2seq model

Evaluation 1. Number of Simulated Turns

Evaluation 2. Diversity

Evaluation Evaluation

Evaluation

Results Input Mutual Information The proposed model How old are you ?
I’m 16. I’m 16. why are you asking ? what is your full name ? i have no idea what 's yours ? I don 't want to go home tonight . Really ? Why ? Do you have any feelings for me ? I don’t know what you are talking about. Would I see you if I didn 't ?

Simulation How old are you ?

Simulation How old are you ? i 'm 16, why are you asking ?

Simulation How old are you ? i 'm 16, why are you asking ?
I thought you were 12 .

I thought you were 12 . What made you think so ?

I don’t know what you are talking about .
Simulation How old are you ? i 'm 16. why are you asking ? I thought you were 12 . What made you think so ? I don’t know what you are talking about .

I thought you were 12 . What made you think so ? I don’t know what you are talking about . You don’t know what you are saying.

I thought you were 12 . What made you think so ? I don’t know what you are talking about . You don’t know what you are saying. I don’t know what you are talking about .

Simulation Survive 4 turns !! How old are you ?
i 'm 16, why are you asking ? I thought you were 12 . What made you think so ? I don’t know what you are talking about . You don’t know what you are saying. I don’t know what you are talking about .

Ongoing Work Better automatic evaluation metrics. (BLEU ? Perplexity ?)

Ongoing Work Turing Test
Better automatic evaluation metrics. (BLEU ? Perplexity ?) Turing Test Keep an on

Future Work Turing Test Generative Adversarial Nets
Better automatic evaluation metrics. (BLEU ? Perplexity ?) Better automatic evaluation metrics. Turing Test Generative Adversarial Nets

Outline Mutual Information for Response Generation. (Chitchat)
How to preserve Speaker Consistency (Chitchat) Reinforcement learning for Response Generation (Chitchat) Teaching a bot to ask questions (Goal-oriented)

How do you like Hom Tanks?
Introduction How do you like Hom Tanks?

How do you like Hom Tansk?
Introduction Case 1 How do you like Hom Tansk? Who is Hom Tanks?

Introduction Case 1 How do you like Hom Tanks? Who is Hom Tanks ? Do you mean Tom Hanks ?

Introduction Case 1 How do you like Hom Tanks? Who is Hom Tanks?
Hom Tanks is the leading actor in Forest Gump.

Introduction Case 1 How do you like Hom Tanks? Who is Hom Tanks?
Hom Tanks is the leading actor in Forest Gump. Oh. Yeah. I like him a lot.

Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks?

Introduction What will Current Chatbot Systems Do ?
How do you like Hom Tanks? How do you like Hom Tanks? UNK

Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like UNK ?

Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like UNK ? Give an output anyway

Introduction What will Current Chatbot Systems Do ? How do you like Hom Tanks? How do you like UNK ? Forward Backward softmax

How do you like Hom Tanks? How do you like UNK ? output I hate him. He’s such a jerk. Forward Backward softmax

How do you like Hom Tanks? Searching the Web for “how do you like Hom Tanks”

MovieQA Domain

MovieQA Domain Template

In what scenarios does a bot need to ask questions ?

Case 1: Question Clarification

Case 1: Question Clarification Task 1

Case 1: Question Clarification Task 1 Task 2

Question Templates Case 1: Question Clarification Task 1

Questioning Asking Templates Case 1: Question Clarification Task 1

Case 2: Knowledge Operation.

Case 2: Knowledge Operation.
Questioning Asking Templates

Case 2: Knowledge Operation.
Task 3 Task 4

Case 3: Knowledge Acquisition .

Case 3: Knowledge Acquisition . Not in the KB … Other questions/ Other answers

Settings

1. Off-line supervised settings

Training Input … Other questions/ Other answers

Training Input … Other questions/ Other answers The teacher’s question

Training Input … Other questions/ Other answers Dialogue History

Training Input … Other questions/ Other answers KB facts

Training Input … Other questions/ Other answers Output

Input Output … Other questions/ Other answers
End-to-End Memory Networks Output

Training Settings

Training Settings: 1. Never Asking Questions (TrainQA)
Each training setting corresponds a way to generate a kind of dataset

2. Always Asking Question (TrainAQ)

Test Settings: 1. Never Asking Questions (TrainQA)
2. Always Asking Question (TrainAQ)

Make Predictions Test Settings: 1. Never Asking Questions (TestQA)
2. Always Asking Question (TestAQ) ????? Make Predictions ?????

Task 1-9 TestQA TestAQ TrainQA TrainAQ

Results

Results Asking questions always helps at test time.

Results Asking questions always helps at test time. Only asking questions at training time does not help

Results Asking questions always helps at test time. Only asking questions at training time does not help TrainAQ+TrainAQ performs the best

Setting2: Reinforcement Learning
Shall I ask a question ???

Ask a question or not …..

Ask a question or not ….. If Yes

Ask a question or not ….. If Yes Get Penalized by Cost(AQ)

Ask a question or not ….. If Yes

Ask a question or not ….. If Yes Get Penalized by Cost(AQ)

Ask a question or not ….. If Yes Get Penalized by Cost(AQ) +1

Ask a question or not ….. If Yes Get Penalized by Cost(AQ) -1

Ask a question or not ….. If Yes If No Get Penalized by Cost(AQ)

Ask a question or not ….. If Yes If No +1 Get Penalized by Cost(AQ)

Ask a question or not ….. If Yes If No -1 Get Penalized by Cost(AQ)

Memory Network Ask a question or not ….. Get Penalized by r

Memory Network Ask a question or not ….. Get Penalized by r Memory Network Memory Network

Policy Gradient Setting2: Reinforcement Learning Memory Network
Ask a question or not ….. Policy Gradient

Policy Gradient Setting2: Reinforcement Learning Baseline
Memory Network Ask a question or not ….. Policy Gradient Baseline

Bad Student

Conclusions: Asking questions helps improve performance.

Conclusion We explored multiple strategies to develop better chit-chat style chatbot

Conclusion We explored multiple strategies to develop better chit-chat style chatbot (mutual information, speaker consistency, reinforcement learning)

Conclusion We explored multiple strategies to develop better chit-chat style chatbot (mutual information, speaker consistency, reinforcement learning) We explored how a bot can interact with users by asking questions to better complete a goal

Bayes Rule Anti-language Model

Training P(T|S) and P(T) Decoding

Anti-language Model Training P(T|S) and P(T) Decoding

Ungrammatical Responses !

Solution 1

Solution 1 Only penalize first few words

Direct Decoding is infeasible

Direct Decoding is infeasible Train P(T|S), P(S|T)

Direct Decoding is infeasible Train P(T|S), P(S|T) Generate N-best list using P(T|S)

Direct Decoding is infeasible Train P(T|S), P(S|T) Generate N-best list using P(T|S) Rerank the N-best list using P(S|T) Font too small

Two slides bigger

Persona seq2seq model Tradeoff Encoding Decoding Bob Bob Bob Bob in uk
. EOS Encoding Decoding Bob Bob Bob Bob where do you live EOS in uk . Tradeoff

Teaching Machines to Converse

Similar presentations

Presentation on theme: "Teaching Machines to Converse"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Teaching Machines to Converse

Similar presentations

Presentation on theme: "Teaching Machines to Converse"— Presentation transcript:

Similar presentations

About project

Feedback