DQA meeting: 17.07.2007: Learning more effective dialogue strategies using limited dialogue move features Matthew Frampton & Oliver Lemon, Coling/ACL-2006.

Slides:



Advertisements
Similar presentations
Dialogue Policy Optimisation
Advertisements

On-line dialogue policy optimisation Milica Gašić Dialogue Systems Group.
Advanced Health Models and Meaningful Use Workgroup: Roadmap Charge Overview Paul Tang, chair Joe Kimura, co-chair.
OpenDial Framework Svetlana Stoyanchev SDS seminar 3/23.
Strategic Staffing Chapter 9 – Assessing External Candidates
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
E XPLORING M ARKOV D ECISION P ROCESS V IOLATIONS IN R EINFORCEMENT L EARNING Jordan Fryer – University of Portland Working with Peter Heeman 1.
The HIGGINS domain The primary domain of HIGGINS is city navigation for pedestrians. Secondarily, HIGGINS is intended to provide simple information about.
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
Fundamental limits in Information Theory Chapter 10 :
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning Introduction Presented by Alp Sardağ.
1 Spontaneous-Speech Dialogue System In Limited Domains ( ) Development of an oral human-machine interface, by way of dialogue, for a semantically.
Systems Analysis Requirements determination Requirements structuring
Principles and Methods
Business research methods: data sources
Introduction to Systems Analysis and Design
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Performance Management Measuring Performance Using Information to Improve Performance.
Learning & Teaching with Technology Claire O’Malley School of Psychology.
Chapter 8 Prediction Algorithms for Smart Environments
Systems Investigation and Analysis
Introduction to SDLC: System Development Life Cycle Dr. Dania Bilal IS 582 Spring 2009.
Chapter 8 – Software Testing Lecture 1 1Chapter 8 Software testing The bearing of a child takes nine months, no matter how many women are assigned. Many.
Experimental Research Methods in Language Learning Chapter 2 Experimental Research Basics.
Speech and Language Processing Chapter 24 of SLP (part 3) Dialogue and Conversational Agents.
Dialogue Managers in two projects: Comic and Amitie Roberta Catizone University of Sheffield.
26 June 2008 DG REGIO Evaluation Network Meeting Ex-post Evaluation of Cohesion Policy Programmes co-financed by the European Fund for Regional.
Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues.
Business Process Change and Discrete-Event Simulation: Bridging the Gap Vlatka Hlupic Brunel University Centre for Re-engineering Business Processes (REBUS)
Conversation as Action Under Uncertainty Tim Paek Eric Horvitz.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
1 CS 224S W2006 CS 224S LING 281 Speech Recognition and Synthesis Lecture 15: Dialogue and Conversational Agents (III) Dan Jurafsky.
16.0 Spoken Dialogues References: , Chapter 17 of Huang 2. “Conversational Interfaces: Advances and Challenges”, Proceedings of the IEEE,
Unit 18 Advanced Database Design
Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
WERST – Methodology Group
Domain Act Classification using a Maximum Entropy model Lee, Kim, Seo (AAAI unpublished) Yorick Wilks Oxford Internet Institute and University of Sheffield.
Logical Framework Slide 1 Mekong Institute & UNESCO Regional Office-Bangkok 23 February – 6 March 2009; Khon Kaen, Thailand Prepared by the Education Policy.
FDT Foil no 1 MSC Structuring MSCs Using Message Sequence Charts for real systems.
Version 02U-1 Computer Security: Art and Science1 Correctness by Construction: Developing a Commercial Secure System by Anthony Hall Roderick Chapman.
Communicative Language Teaching
Goteborg University Dialogue Systems Lab Comments on ”A Framework for Dialogue Act Specification” 4th Workshop on Multimodal Semantic Representation January.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
MIS.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Designing a Speaking Task Workshop  Intended learning outcomes  Definition of a task  Principles of second language acquisition  Principles of developing.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Public Management Information Systems System Analysis & Design Saturday, June 11, 2016 Hun Myoung Park, Ph.D. Public Management & Policy Analysis Program.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Assignment Guidance Dr Nisheet Gosai
3.1 Fundamentals of algorithms
N-Gram Based Approaches
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
Adversarial Learning for Neural Dialogue Generation
Chapter 5. The Bootstrapping Approach to Developing Reinforcement Learning-based Strategies in Reinforcement Learning for Adaptive Dialogue Systems, V.
Integrating Learning of Dialog Strategies and Semantic Parsing
The Rich and the Poor: A Markov Decision Process Approach to Optimizing Taxi Driver Revenue Efficiency Huigui Rong, Xun Zhou, Chang Yang, Zubair Shafiq,
Dr. Unnikrishnan P.C. Professor, EEE
Learning a Policy for Opportunistic Active Learning
Dialogue State Tracking & Dialogue Corpus Survey
21 3 Variables Selection Functions Repetition Challenge 21
Chapter 7 Software Testing.
Presentation transcript:

DQA meeting: : Learning more effective dialogue strategies using limited dialogue move features Matthew Frampton & Oliver Lemon, Coling/ACL-2006 Presented by: Mark Hepple

Data-driven methodology for SDS Development Broader context = realising a “data-driven” methodology for creating SDSs, with following steps: –1. Collect data (using prototype or WOZ) –2. Build probabilistic user simulation from data (covering user behaviour, ASR errors) –3. [Feature selection - using USimulation] –4. Learn dialog strategy, using reinforcement learning over system interactions with simulation

Task Information seeking dialog systems –Specifically task-oriented, slot-filling dialogs, leading to database query –E.g. getting user requirements for a flight booking (c.f. COMMUNICATOR task) –Aim is to achieve effective system strategy for such dialog interactions

Reinforcement Learning System modelled as a Markov Decision Process (MDP) –model decision making in situations where outcomes partly random, partly under system control Reinforcement learning used to learn an effective policy –determines best action to take in each situation Aim is to maximize overall reward –need a reward function, assigning reward value of different dialogs

Action set of dialog system –1. Ask open qu. (how may I help you?) –2. Ask value for slot 1..n –3. Explicitly confirm a slot 1..n –4. Ask for slot k, whilst implicitly confirm slot k-1 or k+1 –5. Give help –6. Pass to human operator –7. Database query

Reward function Reward function is “all-or-nothing”: –1. DB query, all slots confirmed = +100 –2. Any other DB query = -75 –3. Usimulation hangs up = -100 –4. System passes to human operator = -50 –5. Each system turn = -5

N-Gram User Simulation Employs n-gram user simulation of Georgila, Lemon & Henderson: –Derived from annotated version of COMMUNICATOR data –Treat dialog as sequence of pairs of DAs/tasks –Output next user “utterance” as DA/task pair, based on last n-1 pairs –Incorporate effects of ASR errors (built from user utts as recognised by ASR components of original COMMUNICATOR systems) –Have separate 4- and 5-gram simulations: used for training/testing

Key Question: what context features to use Past work - has used on limited state information –Based on number/fill-status of slots Proposal: include richer context information, specifically –Dialog act (DA) of last system turn –DA of last user turn

Experiments Compare 3 systems –Baseline: slot features only –Strat 2: slot features + last user DA –Strat 3: slot features + last user + system Das Train with 4-gram, test with 5-gram USim, and vice versa

Results Main reported result is improvement in average reward level of dialogs for strategies, compared to baseline –Str-2 improves over baseline by 4.9% –Str-3 improves over baseline by 7.8% All 3 strategies achieve 100% slot filling and confirmation Augmented strategies also improve over baseline w.r.t. average dialogue length

Qualitative Analysis Learns to: –Only query DB when all slots filled –Not pass to operator –Use implicit confirmation where possible Emergent behaviour: –When baseline system fails to fill/confirm slot from user input, state remains same, and system will repeat same action –For augmented systems, ‘state’ changes, so can learn to do different action, e.g. ask about a different slot, or use “give help” action

Questions/Comments Value of performance improvement figures based on reward? Does improvement w.r.t. reward function -> improvement for human-machine dialogs Validity of comparisons to COMMr systems Why does system performance improve? –Is avoidance of repetition the key?