Learning for Dialogue.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Patch to the Future: Unsupervised Visual Prediction
CSC411- Machine Learning and Data Mining Unsupervised Learning Tutorial 9– March 16 th, 2007 University of Toronto (Mississauga Campus)
Chapter 14: Usability testing and field studies. Usability Testing Emphasizes the property of being usable Key Components –User Pre-Test –User Test –User.
Reinforcement Learning
Learning from Observations Chapter 18 Section 1 – 4.
Designing Help… Mark Johnson Providing Support Issues –different types of support at different times –implementation and presentation both important.
Discretization Pieter Abbeel UC Berkeley EECS
Ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog.
Robotics for Intelligent Environments
Reinforcement Learning Yishay Mansour Tel-Aviv University.
MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $
Efficient Interaction Strategies for Adaptive Reminding Julie S. Weber & Martha E. Pollack Adaptive Reminder Generation SignalingIntended Approach Learning.
Policy Analysis: Frameworks and Models Stokey and Zeckhauser Ch1-3 Jenkins-Smith Ch1-3 Weimer and Vining Ch1-2.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.
Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues.
Reinforcement Learning
Learning Automata based Approach to Model Dialogue Strategy in Spoken Dialogue System: A Performance Evaluation G.Kumaravelan Pondicherry University, Karaikal.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Some questions -What is metadata? -Data about data.
1 University of Southern California Between Collaboration and Competition: An Initial Formalization using Distributed POMDPs Praveen Paruchuri, Milind.
Distributed Q Learning Lars Blackmore and Steve Block.
3.04 Interpret marketing information to test hypotheses and/ or solve issues Marketing Management.
4 th International Conference on Service Oriented Computing Adaptive Web Processes Using Value of Changed Information John Harney, Prashant Doshi LSDIS.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Dialogue Modeling 2. Indirect Requests “Can I have a cup of coffee?”  One approach to dealing with these kinds of requests is by plan-based inference.
Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.
Information Extraction. Extracting Information from Text System : When would you like to meet Peter? User : Let’s see, if I can, I’d like to meet him.
祁东职业中专 蔡平 U nit 5 At the Hotel! Look at the picture and discuss 1. Where are they? 2. Who are they? 3. What are they doing? 1. Where are they? 2. Who.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
Decision Trees Jennifer Tsay CS 157B February 4, 2010.
How to use your Sewanee Student Patient Portal. You can find your Patient Portal at: This is your Patient Portal home.
Reinforcement learning for dialogue systems
Logic Gates Binary Day 3 PEOPLE 2016.
Ethical Decision Making
Coexisting with Carnivores
Machine Learning Inductive Learning and Decision Trees
Module 11: File Structure
Generative Adversarial Imitation Learning
Modeling human action understanding as inverse planning
WP2 INERTIA Distributed Multi-Agent Based Framework
IEEE 802 Rules Update November 2012 meeting
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Prof. Dr. Holger Schlingloff 1,2 Dr. Esteban Pavese 1
Chapter 11: Learning Introduction
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Reinforcement Learning
Chapter 6: Conditional Statements and Loops
Reinforcement Learning
Client Appointments Setup
Integrating Learning of Dialog Strategies and Semantic Parsing
"Playing Atari with deep reinforcement learning."
Course Advice Meeting process, exams and holidays
CS 188: Artificial Intelligence
Managing Dialogue Julia Hirschberg CS /28/2018.
An Introduction to Supervised Learning
Dr. Unnikrishnan P.C. Professor, EEE
REVISION TIMETABLE – TERM TIME
Reinforcement Learning
Hello Welcome to Statistics I – STAB22 Section: LEC01 Tuesdays and Fridays: 12 to 1 pm. Room: IC 130 Fall 2014.
Deep Reinforcement Learning
Lecture 5: Writing Page
Contacting People.
Reinforcement Learning Dealing with Partial Observability
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

Learning for Dialogue

Calls up appointment frame Frame-Based Models User : I’d like to schedule an appointment. System : Who is the other party? User : John, sometime on tuesday Calls up appointment frame People :____ Time : ____ Location : ___ People : John Time : Tuesday Location : ___

Grounding in Frames After each Slot filled After entire Form Finished User : John, sometime on Tuesday. System: Ok where do you want to meet John on Tuesday? After entire Form Finished System : Ok, I am scheduling you in Room 332 with John on Tuesday at 4pm

How should we decide? By Fiat? User Studies? Always at the end? Always after a prompt  not flexible User Studies? Examine which is preferred  Huge time Cost for developers

Other similar decisions What is the appropriate order to ask questions? How to resolve conflicts? When to combine two questions?

Learn to Interpret Responses Map words to semantics E.g. “Yes, Yeah. Uh-Huh” all are positive Learn how to extract information from a complex statement “Anytime after 2pm.” “2pm or later, please.”

Learn On-line Ideally, the computer would learn over time, as it has more dialogues.

Lecture Outline Reinforcement Learning Markov Decision Processes

Learning Frameworks Unsupervised Learning Supervised Learning Raw data  markup E.g. clustering words Supervised Learning Batch – data already marked up E.g. Tagging

Online Learning

Online Learning Start with an initial model Act with a certain behavior predicted by the model As input comes in from the world adapt model to match what the world gives Related to “Active Learning”

Reinforcement Learning Along with input from outside world there is a signal – good or bad Learn whether to say “Hello” or “Hiyaz” .9  Hello .1  Hiyaz .5  Hello .5  Hiyaz .1  Hello .9  Hiyaz

Reinforcement Learning Positive reinforcement – rewards good behavior Hello Good! Hello

Reinforcement Learning Negative reinforcement punishes bad behavior Hiyaz Bad! Hello

More Complicated World Say “Good Morning” in the morning, “Good night” at night Morning Good Morning Night Good night

When is it Day? When is it Night? Dark out Getting Light Night

Reward good behavior Day “Morning” Dark Light Night “Night”

Punish bad behavior Day “Night” Dark Light Night Morning”

But how can this be formalized?

Lecture Outline Reinforcement Learning Markov Decision Processes

State Space Day States Night

State Transitions Day Transitions Night

Observations Day Observations Dark Light Night

Actions Day “Morning” “Night” Dark Light Night “Night” “Morning”

Policy Day .9 “Morning” .1“Night” Dark Light Night .9 “Night”

Rewards Day “Morning” “Night” Dark Light Night “Night” “Morning”

MDP Framework Given a set state space, with observable transitions, determine the policy which maximizes the rewards

Optimal Policy Day 1 “Morning” 0 “Night” Dark Light Night 1 “Night”

An alternative view Actions cause a movement in the state space Rewards are allocated by state If you end up in a particular state you get a certain reward

Actions cause State Changes Grounded People : John Time : Tuesday Location : ___ Ask Location Ground John/Tuesday Grounded Grounded X People : John Time : Tuesday Location : ___ People : John Time : Tuesday Location : Building 12 X

Optimal Policy Determines a State Space Traversal Taking an action is choosing a particular state, associated with a specific reward. Grounded Grounded X People : John Time : Tuesday Location : ___ People : John Time : Tuesday Location : Building 12 X