Download presentation
Presentation is loading. Please wait.
1
Learning for Dialogue
2
Calls up appointment frame
Frame-Based Models User : I’d like to schedule an appointment. System : Who is the other party? User : John, sometime on tuesday Calls up appointment frame People :____ Time : ____ Location : ___ People : John Time : Tuesday Location : ___
3
Grounding in Frames After each Slot filled After entire Form Finished
User : John, sometime on Tuesday. System: Ok where do you want to meet John on Tuesday? After entire Form Finished System : Ok, I am scheduling you in Room 332 with John on Tuesday at 4pm
4
How should we decide? By Fiat? User Studies? Always at the end?
Always after a prompt not flexible User Studies? Examine which is preferred Huge time Cost for developers
5
Other similar decisions
What is the appropriate order to ask questions? How to resolve conflicts? When to combine two questions?
6
Learn to Interpret Responses
Map words to semantics E.g. “Yes, Yeah. Uh-Huh” all are positive Learn how to extract information from a complex statement “Anytime after 2pm.” “2pm or later, please.”
7
Learn On-line Ideally, the computer would learn over time, as it has more dialogues.
8
Lecture Outline Reinforcement Learning Markov Decision Processes
9
Learning Frameworks Unsupervised Learning Supervised Learning
Raw data markup E.g. clustering words Supervised Learning Batch – data already marked up E.g. Tagging
10
Online Learning
11
Online Learning Start with an initial model
Act with a certain behavior predicted by the model As input comes in from the world adapt model to match what the world gives Related to “Active Learning”
12
Reinforcement Learning
Along with input from outside world there is a signal – good or bad Learn whether to say “Hello” or “Hiyaz” .9 Hello .1 Hiyaz .5 Hello .5 Hiyaz .1 Hello .9 Hiyaz
13
Reinforcement Learning
Positive reinforcement – rewards good behavior Hello Good! Hello
14
Reinforcement Learning
Negative reinforcement punishes bad behavior Hiyaz Bad! Hello
15
More Complicated World
Say “Good Morning” in the morning, “Good night” at night Morning Good Morning Night Good night
16
When is it Day? When is it Night?
Dark out Getting Light Night
17
Reward good behavior Day “Morning” Dark Light Night “Night”
18
Punish bad behavior Day “Night” Dark Light Night Morning”
19
But how can this be formalized?
20
Lecture Outline Reinforcement Learning Markov Decision Processes
21
State Space Day States Night
22
State Transitions Day Transitions Night
23
Observations Day Observations Dark Light Night
24
Actions Day “Morning” “Night” Dark Light Night “Night” “Morning”
25
Policy Day .9 “Morning” .1“Night” Dark Light Night .9 “Night”
26
Rewards Day “Morning” “Night” Dark Light Night “Night” “Morning”
27
MDP Framework Given a set state space, with observable
transitions, determine the policy which maximizes the rewards
28
Optimal Policy Day 1 “Morning” 0 “Night” Dark Light Night 1 “Night”
29
An alternative view Actions cause a movement in the state space
Rewards are allocated by state If you end up in a particular state you get a certain reward
30
Actions cause State Changes
Grounded People : John Time : Tuesday Location : ___ Ask Location Ground John/Tuesday Grounded Grounded X People : John Time : Tuesday Location : ___ People : John Time : Tuesday Location : Building 12 X
31
Optimal Policy Determines a State Space Traversal
Taking an action is choosing a particular state, associated with a specific reward. Grounded Grounded X People : John Time : Tuesday Location : ___ People : John Time : Tuesday Location : Building 12 X
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.