Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues.

Slides:

Advertisements

Similar presentations

Dialogue Modelling Milica Gašić Dialogue Systems Group.

Advertisements

Dialogue Policy Optimisation

On-line dialogue policy optimisation Milica Gašić Dialogue Systems Group.

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.

Reinforcement learning (Chapter 21)

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

Apprenticeship learning for robotic control Pieter Abbeel Stanford University Joint work with Andrew Y. Ng, Adam Coates, Morgan Quigley.

Reinforcement learning

Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]

Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.

Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel Stanford University [Joint work with Andrew Ng.]

Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.

CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)

Soar-RL Discussion: Future Directions, Open Questions, and Why You Should Use It 31 st Soar Workshop 1.

Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel and Andrew Y. Ng Stanford University.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Pieter Abbeel and Andrew Y. Ng Reinforcement Learning and Apprenticeship Learning Pieter Abbeel and Andrew Y. Ng Stanford University.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Reinforcement Learning (1)

Making Decisions CSE 592 Winter 2003 Henry Kautz.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Instructor: Vincent Conitzer

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Introduction Many decision making problems in real life

Speech and Language Processing Chapter 24 of SLP (part 3) Dialogue and Conversational Agents.

Learning Automata based Approach to Model Dialogue Strategy in Spoken Dialogue System: A Performance Evaluation G.Kumaravelan Pondicherry University, Karaikal.

1 CS 224S W2006 CS 224S LING 281 Speech Recognition and Synthesis Lecture 15: Dialogue and Conversational Agents (III) Dan Jurafsky.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Reinforcement Learning

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.

1 Introduction to Reinforcement Learning Freek Stulp.

George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Probabilistic Luger: Artificial.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Towards Robust Revenue Management: Capacity Control Using Limited Demand Information Michael Ball, Huina Gao, Yingjie Lan & Itir Karaesmen Robert H Smith.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

CS 182 Reinforcement Learning. An example RL domain Solitaire –What is the state space? –What are the actions? –What is the transition function? Is it.

Keep the Adversary Guessing: Agent Security by Policy Randomization

On-Line Markov Decision Processes for Learning Movement in Video Games

Reinforcement Learning (1)

Learning for Dialogue.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Reinforcement Learning

Chapter 5. The Bootstrapping Approach to Developing Reinforcement Learning-based Strategies in Reinforcement Learning for Adaptive Dialogue Systems, V.

Markov Decision Processes

Harm van Seijen Bram Bakker Leon Kester TNO / UvA UvA

Markov Decision Processes

RL methods in practice Alekh Agarwal.

Apprenticeship Learning via Inverse Reinforcement Learning

Dr. Unnikrishnan P.C. Professor, EEE

یادگیری تقویتی Reinforcement Learning

Reinforcement Learning

Deep Reinforcement Learning

CS 188: Artificial Intelligence Fall 2008

Designing Neural Network Architectures Using Reinforcement Learning

CS 416 Artificial Intelligence

Department of Computer Science Ben-Gurion University

Reinforcement Nisheeth 18th January 2019.

Reinforcement Learning (2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Reinforcement Learning (2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Morteza Kheirkhah University College London

Presentation transcript:

Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues Workshop 06

Reinforcement Learning for SDS Dialogue manager (DM) in spoken dialogue systems –Selects actions | {observations & beliefs} –Typically, hand-crafted & knowledge intensive –RL seeks to formalize & optimize action selection –Once dialogue dynamics is represented as Markov Decision Process (MDP), derive optimal policy MDP in a nutshell –Input tuple (S,A,T,R) → Output policy –Objective Function: –Value Function:  To what extent can RL really automate hand-crafted DM?  Is RL practical for speech application development?

StrengthsWeaknesses Objective Function Optimization framework Explicitly defined & adjustable Can serve as evaluation metric Unclear what dialogues can & cannot be modeled using a specifiable objective function Not easily adjusted Reward Function Overall behavior very sensitive to changes in reward Mostly hand-crafted & tuned Not easily adjusted State Space & Transition Function Once modeled as MDP or POMDP, well-studied algorithms exist for deriving policy State space small due to algorithmic limitations Selection is still mostly manual No best practices No domain-independent state variables Markov assumption Not easily adjusted Policy Guaranteed to be optimal with respect to the data Can function as black box Removes control away from developers Not easily adjusted No theoretical insight Evaluation Can be rapidly trained and tested with user models Real user behavior may differ Testing on same user model is cheating Hand-crafted policy should be tuned to same objective function

Opportunities Objective Function Open up black boxes and optimize ASR / SLU using same objective function as DM Reward Function Inverse reinforcement learning Adapt reward function / policy based on user type / behavior (similar to adapting mixed initiative) State Space & Transition Function Learn what state space variables are important for local / long term reward Apply more efficient POMDP methods Identify domain-independent state variables Identify best practices Policy Online policy learning (explore vs. exploit) Identify domain-independent, reusable error handling mechanisms Evaluation Close gap between user model and real user behavior

Any comments?