Learning for Dialogue.

Learning for Dialogue

Calls up appointment frame
Frame-Based Models User : I’d like to schedule an appointment. System : Who is the other party? User : John, sometime on tuesday Calls up appointment frame People :____ Time : ____ Location : ___ People : John Time : Tuesday Location : ___

Grounding in Frames After each Slot filled After entire Form Finished
User : John, sometime on Tuesday. System: Ok where do you want to meet John on Tuesday? After entire Form Finished System : Ok, I am scheduling you in Room 332 with John on Tuesday at 4pm

How should we decide? By Fiat? User Studies? Always at the end?
Always after a prompt  not flexible User Studies? Examine which is preferred  Huge time Cost for developers

Other similar decisions
What is the appropriate order to ask questions? How to resolve conflicts? When to combine two questions?

Learn to Interpret Responses
Map words to semantics E.g. “Yes, Yeah. Uh-Huh” all are positive Learn how to extract information from a complex statement “Anytime after 2pm.” “2pm or later, please.”

Learn On-line Ideally, the computer would learn over time, as it has more dialogues.

Lecture Outline Reinforcement Learning Markov Decision Processes

Learning Frameworks Unsupervised Learning Supervised Learning
Raw data  markup E.g. clustering words Supervised Learning Batch – data already marked up E.g. Tagging

Online Learning

Online Learning Start with an initial model
Act with a certain behavior predicted by the model As input comes in from the world adapt model to match what the world gives Related to “Active Learning”

Reinforcement Learning
Along with input from outside world there is a signal – good or bad Learn whether to say “Hello” or “Hiyaz” .9  Hello .1  Hiyaz .5  Hello .5  Hiyaz .1  Hello .9  Hiyaz

Positive reinforcement – rewards good behavior Hello Good! Hello

Negative reinforcement punishes bad behavior Hiyaz Bad! Hello

More Complicated World
Say “Good Morning” in the morning, “Good night” at night Morning Good Morning Night Good night

When is it Day? When is it Night?
Dark out Getting Light Night

Reward good behavior Day “Morning” Dark Light Night “Night”

Punish bad behavior Day “Night” Dark Light Night Morning”

But how can this be formalized?

Lecture Outline Reinforcement Learning Markov Decision Processes

State Space Day States Night

State Transitions Day Transitions Night

Observations Day Observations Dark Light Night

Actions Day “Morning” “Night” Dark Light Night “Night” “Morning”

Policy Day .9 “Morning” .1“Night” Dark Light Night .9 “Night”

Rewards Day “Morning” “Night” Dark Light Night “Night” “Morning”

MDP Framework Given a set state space, with observable
transitions, determine the policy which maximizes the rewards

Optimal Policy Day 1 “Morning” 0 “Night” Dark Light Night 1 “Night”

An alternative view Actions cause a movement in the state space
Rewards are allocated by state If you end up in a particular state you get a certain reward

Actions cause State Changes
Grounded People : John Time : Tuesday Location : ___ Ask Location Ground John/Tuesday Grounded Grounded X People : John Time : Tuesday Location : ___ People : John Time : Tuesday Location : Building 12 X

Optimal Policy Determines a State Space Traversal
Taking an action is choosing a particular state, associated with a specific reward. Grounded Grounded X People : John Time : Tuesday Location : ___ People : John Time : Tuesday Location : Building 12 X

Learning for Dialogue.

Similar presentations

Presentation on theme: "Learning for Dialogue."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning for Dialogue.

Similar presentations

Presentation on theme: "Learning for Dialogue."— Presentation transcript:

Similar presentations

About project

Feedback