Integrating Learning of Dialog Strategies and Semantic Parsing

Slides:



Advertisements
Similar presentations
Dialogue Modelling Milica Gašić Dialogue Systems Group.
Advertisements

Dialogue Policy Optimisation
On-line dialogue policy optimisation Milica Gašić Dialogue Systems Group.
Learning to Interpret Natural Language Instructions Monica Babeş-Vroman +, James MacGlashan *, Ruoyuan Gao +, Kevin Winner * Richard Adjogah *, Marie desJardins.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Patch to the Future: Unsupervised Visual Prediction
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Planning under Uncertainty
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Release & Deployment ITIL Version 3
INFORMATION SYSTEMS Overview
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
David L. Chen and Raymond J. Mooney Department of Computer Science The University of Texas at Austin Learning to Interpret Natural Language Navigation.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Introduction to Dialogue Systems. User Input System Output ?
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Advanced Software Engineering Dr. Cheng
Human Computer Interaction Lecture 21 User Support
Evolvable dialogue systems
PeerWise Student Instructions
Provide instruction.
Transaction Processing System (TPS)
Figure 5: Change in Blackjack Posterior Distributions over Time.
Human Computer Interaction Lecture 21,22 User Support
Chapter 11: Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
“Intelligent User Interfaces” by Hefley and Murray A 1993 Perspective
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
SOFTWARE DESIGN AND ARCHITECTURE
CS b659: Intelligent Robotics
Semantic Parsing for Question Answering
Chapter 11: Usability © Len Bass, Paul Clements, Rick Kazman, distributed under Creative Commons Attribution License.
Reinforcement Learning in POMDPs Without Resets
Learning for Dialogue.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Introduction to Programmng in Python
Why bother – is this not the English Department’s job?
Chapter 5. The Bootstrapping Approach to Developing Reinforcement Learning-based Strategies in Reinforcement Learning for Adaptive Dialogue Systems, V.
IS442 Information Systems Engineering
LESSON 12 - Loops and Simulations
Transaction Processing System (TPS)
Objective of This Course
Software life cycle models
Transaction Processing System (TPS)
CIS 488/588 Bruce R. Maxim UM-Dearborn
Learning to Sportscast: A Test of Grounded Language Acquisition
Learning a Policy for Opportunistic Active Learning
SAMANVITHA RAMAYANAM 18TH FEBRUARY 2010 CPE 691
Automated Software Integration
Chapter 11 Practical Methodology
Dialogue State Tracking & Dialogue Corpus Survey
Data Warehousing Concepts
Reinforcement Learning Dealing with Partial Observability
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Instructor: Vincent Conitzer
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Chapter 1: Creating a Program.
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
  Using the RUMM2030 outputs as feedback on learner performance in Communication in English for Adult learners Nthabeleng Lepota 13th SAAEA Conference.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

Integrating Learning of Dialog Strategies and Semantic Parsing Aishwarya Padmakumar, Jesse Thomason, Raymond Mooney Department of Computer Science The University of Texas at Austin

Natural Language Interaction with Robots With the advances in AI, robots and various automated and semi-automated agents are increasingly becoming a part of our lives. For most users, the eaiest way to communicate with them, would be as they do with each other, in natural language.

Understanding Commands Go to Bob Brown’s office Bring coffee for Alice One of the basic things we would like to be able to do is to give high level commands such as “Bring coffee for Alice” or “Go to Bob Brown’s office”, and have such a robot be able to understand what we want it to do.

Goal of this work Dialog Management Semantic Parsing The goal of this work is to improve understanding of such commands by combining work in semantic parsing and dialog policy learning.

Semantic Parsing go to Alice ‘s office Semantic parsing transforms the natural language utterance into a logical form that the agent is capable of interpreting. go to Alice ‘s office

Advantages of semantic parsing Complete command Bring coffee for Bob Brown Action Patient Recipient Bob Brown’s office Noun Phrase Yes, you’re right Confirmation It can be used to identify the action and semantic roles in a complete command, the same component can understand other types of responses such as noun phrases or confimations.

Advantages of semantic parsing Alice Alice Ashcraft Ms Alice Ashcraft Alice’s office Alice Ashcraft’s office Ms Alice Ashcraft’s office It also exploits the compositionality of language to avoid learning too many unnecessary lexical entries

Clarification Dialogs Obtain specific missing information (Thomason et al., 2015) Weak supervision for improving parser (Artzi et al., 2011; Thomason et al., 2015) For understanding a command, there is no need for an agent to do it in one step. Prior work has shown that engaging in a clarification dialog where the agent is allowed to ask for specific missing information improves command understanding performance. Further, such dialogs can provide weak supervision to continuously update the parser..

Clarification Dialogs Obtain specific missing information (Thomason et al., 2015) Weak supervision for improving parser (Artzi et al., 2011; Thomason et al., 2015) For understanding a command, there is no need for an agent to do it in one step. Prior work has shown that engaging in a clarification dialog where the agent is allowed to ask for specific missing information improves command understanding performance. Further, such dialogs can provide weak supervision to continuously update the parser..

Dialog policy Ask for full command Ask for parameter Confirm Command In such clarification dialogs the agent can perform different actions such as asking for a full command, asking for a specific parameter value, or confirming a complete or partial command.

Dialog policy Mapping from dialog states to dialog actions Information from the dialog so far Probability of each command being what the user intends Static policy Difficult to update as parser changes Optimal policy may not be obvious to hand-code Deciding which of these to do when is done by a dialogue policy. Prior work on clarification dialogs with semantic parsing does this using a set of if-then rules. For example, you can imagine using a threshold on parser confidence scores to decide whether the the robot should confirm a command or directly execute it. The problem with this is that if the parser is being updated from dialogues, this threshold should ideally also be updated. But it would be difficult to keep doing that manually. Also, we might want to include other types of dialogue actions such as asking for new information to be added to the knowledge base, for example, where is Bob’s office, in which case it becomes less obvious what a good policy is.

Background: Reinforcement Learning Reinforcement learning - learning for sequential tasks with only implicit feedback Partially Observable Markov Decision Process - Agent (bt-1) Agent (bt) at ot Environment (st) Environment (st) Environment (st+1) Just like with the parser, we would like our policy also to adapt continuously based on interactions with users. For this, we turn to Reinforcement Learning. Specifically, the model we use is called a Partially Observable Markov Decision Process or POMDP. There is an agent interacting with an environment. At time t, the agent is in state s_t, but this is hidden from it. What it receives is a noisy version of s_t, called the observation o_t. Based on observations, the agent maintains a probability distribution over all states it could be in at time t, called the belief b_t. Based on b_t, the agent takes an action a_t, which causes it to transition to state s_{t+1}

Clarification Dialogs as POMDPs States: What the user is trying to convey via the dialog (commands) Observations: Output of the natural language understanding component Actions: Dialog actions (confirming, requesting a parameter value)

Why is policy learning challenging? Agent (bt-1) Agent (bt) Assumption: at ot Constant probability distribution Environment (st) Environment (st) Environment (st+1) Agent (bt-1) Agent (bt) Our system: at ot Variable probability distribution Environment (st) Environment (st) Environment (st+1)

Why is policy learning challenging? Agent (bt-1) Agent (bt) Assumption: at ot Constant probability distribution Environment (st) Environment (st) Environment (st+1) Non-stationary Environment Agent (bt-1) Agent (bt) Our system: at ot Variable probability distribution Environment (st) Environment (st) Environment (st+1)

Policy learning Kalman Temporal Differences (KTD) Q Learning (Geist and Pietquin, 2010) used to learn policy in non-stationary environment Dialog state tracking - HIS Model (Young et al., 2010) Policy learned over features extracted from belief state Because of this, we need to choose our policy learning algorithm carefully. We use the Kalman Temporal Differences Q Learning algorithm for this, as it allows us to effectively learn a policy in a non-stationary environment, from dialogs collected offline. This allows us to obtain training dialogs in batches via crowdsourcing

Experiments - Mechanical Turk We conducted experiments on Mechanical Turk. Our interface looked like this. In the prompt provided to users, we changed the suggested name (sometimes using nicknames or designations), and used visual prompts for the objects to avoid lexical priming of users. We had a verification step to ensure that we had human users who could understand the task, and asked users to fill a survey of their experience after completing the dialogue.

Hypotheses Combined parser and dialog learning is more useful than either alone. Changes in the parser need to be seen by the policy learner.

Systems Parser Learning Collect Dialogs Collect Dialogs Initial parser Update parser Final parser Initial policy Initial policy Initial policy Dialog Learning Initial parser Initial parser Initial parser Collect Dialogs Collect Dialogs Final policy Initial policy Update policy

Systems Initial policy Initial parser Parser and Dialog Learning - Full Collect Dialogs Final parser Final policy Update parser Update policy Initial policy Collect Dialogs Update parser Initial parser Final parser Parser and Dialog Learning - Batchwise Update policy Final policy

Experiments - Metrics Successful dialog - Objective metrics - User does not exit dialog prematurely Robot performs the correct action Objective metrics - % successful dialogs Dialog length - Average number of turns in successful dialogs Subjective metrics - The robot understood me The robot asked sensible questions

Results - Dialog Success 75 59 72 78 Higher is better Parser learning is mostly responsible for improvement in dialog success rate Best system: parser and dialog learning - batchwise We compared 4 systems - only parser learning, only dialog learning and two settings with both parser and dialog learning. The “batchwise” setting is where both components are updated after each batch of dialogs. In the “full” setting, we collect all 4 training batches at once with the initial components and perform one update using all these dialogues before testing. We can see that parse75 r learning contributes much more towards dialog success but dialog policy learning is more effective at reducing the dialog length. We also see that training both components is more useful, if done in a batchwise manner. The poor performance of the “full” setting is indicative of the fact that the effect of a non stationary environment is non-trivial.

Results - Dialog Length 12.43 11.73 12.76 10.61 Lower is better Dialog learning is mostly responsible for lowering dialog length Best system: parser and dialog learning - batchwise We compared 4 systems - only parser learning, only dialog learning and two settings with both parser and dialog learning. The “batchwise” setting is where both components are updated after each batch of dialogs. In the “full” setting, we collect all 4 training batches at once with the initial components and perform one update using all these dialogues before testing. We can see that parse75 r learning contributes much more towards dialog success but dialog policy learning is more effective at reducing the dialog length. We also see that training both components is more useful, if done in a batchwise manner. The poor performance of the “full” setting is indicative of the fact that the effect of a non stationary environment is non-trivial.

Results - user survey responses 3.28 Higher is better Best system: parser and dialog learning - full Can be conflated with number of questions the system asked 3.17 2.91 2.79 Survey responses were on a 0-4 Likert scale. Again we see that performing both parser and dialog policy learning results in better scores, but the “full” system is not always better than the “batchwise” system.

Results - user survey responses 2.93 2.55 2.79 3.30 Higher is better Best system: parser and dialog learning - batchwise Survey responses were on a 0-4 Likert scale. Again we see that performing both parser and dialog policy learning results in better scores, but the “full” system is not always better than the “batchwise” system.

Conclusion Simultaneous training of a semantic parser and dialog policy using implicit feedback from dialogs. Both batchwise > Parser learning, Dialog learning Changes in other components must be propagated to the policy via implicit feedback to reduce effects of non- stationary environment. Both batchwise > Both full

Integrating Learning of Dialog Strategies and Semantic Parsing Aishwarya Padmakumar, Jesse Thomason, Raymond Mooney Department of Computer Science The University of Texas at Austin