Learning for Dialogue
Calls up appointment frame Frame-Based Models User : I’d like to schedule an appointment. System : Who is the other party? User : John, sometime on tuesday Calls up appointment frame People :____ Time : ____ Location : ___ People : John Time : Tuesday Location : ___
Grounding in Frames After each Slot filled After entire Form Finished User : John, sometime on Tuesday. System: Ok where do you want to meet John on Tuesday? After entire Form Finished System : Ok, I am scheduling you in Room 332 with John on Tuesday at 4pm
How should we decide? By Fiat? User Studies? Always at the end? Always after a prompt not flexible User Studies? Examine which is preferred Huge time Cost for developers
Other similar decisions What is the appropriate order to ask questions? How to resolve conflicts? When to combine two questions?
Learn to Interpret Responses Map words to semantics E.g. “Yes, Yeah. Uh-Huh” all are positive Learn how to extract information from a complex statement “Anytime after 2pm.” “2pm or later, please.”
Learn On-line Ideally, the computer would learn over time, as it has more dialogues.
Lecture Outline Reinforcement Learning Markov Decision Processes
Learning Frameworks Unsupervised Learning Supervised Learning Raw data markup E.g. clustering words Supervised Learning Batch – data already marked up E.g. Tagging
Online Learning
Online Learning Start with an initial model Act with a certain behavior predicted by the model As input comes in from the world adapt model to match what the world gives Related to “Active Learning”
Reinforcement Learning Along with input from outside world there is a signal – good or bad Learn whether to say “Hello” or “Hiyaz” .9 Hello .1 Hiyaz .5 Hello .5 Hiyaz .1 Hello .9 Hiyaz
Reinforcement Learning Positive reinforcement – rewards good behavior Hello Good! Hello
Reinforcement Learning Negative reinforcement punishes bad behavior Hiyaz Bad! Hello
More Complicated World Say “Good Morning” in the morning, “Good night” at night Morning Good Morning Night Good night
When is it Day? When is it Night? Dark out Getting Light Night
Reward good behavior Day “Morning” Dark Light Night “Night”
Punish bad behavior Day “Night” Dark Light Night Morning”
But how can this be formalized?
Lecture Outline Reinforcement Learning Markov Decision Processes
State Space Day States Night
State Transitions Day Transitions Night
Observations Day Observations Dark Light Night
Actions Day “Morning” “Night” Dark Light Night “Night” “Morning”
Policy Day .9 “Morning” .1“Night” Dark Light Night .9 “Night”
Rewards Day “Morning” “Night” Dark Light Night “Night” “Morning”
MDP Framework Given a set state space, with observable transitions, determine the policy which maximizes the rewards
Optimal Policy Day 1 “Morning” 0 “Night” Dark Light Night 1 “Night”
An alternative view Actions cause a movement in the state space Rewards are allocated by state If you end up in a particular state you get a certain reward
Actions cause State Changes Grounded People : John Time : Tuesday Location : ___ Ask Location Ground John/Tuesday Grounded Grounded X People : John Time : Tuesday Location : ___ People : John Time : Tuesday Location : Building 12 X
Optimal Policy Determines a State Space Traversal Taking an action is choosing a particular state, associated with a specific reward. Grounded Grounded X People : John Time : Tuesday Location : ___ People : John Time : Tuesday Location : Building 12 X