Dialogue Modelling Milica Gašić Dialogue Systems Group.

Slides:

Advertisements

Similar presentations

A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan Fern School.

Advertisements

Statisical Spoken Dialogue System Talk 2 – Belief tracking CLARA Workshop Presented by Blaise Thomson Cambridge University Engineering Department

Hidden Information State System A Statistical Spoken Dialogue System M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu and S. Young Cambridge.

Dialogue Policy Optimisation

Statistical Dialogue Modelling Milica Gašić Dialogue Systems Group.

On-line dialogue policy optimisation Milica Gašić Dialogue Systems Group.

David Rosen Goals  Overview of some of the big ideas in autonomous systems  Theme: Dynamical and stochastic systems lie at the intersection of mathematics.

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

Partially Observable Markov Decision Process (POMDP)

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Partially Observable Markov Decision Processes

Dynamic Bayesian Networks (DBNs)

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

Partially Observable Markov Decision Process By Nezih Ergin Özkucur.

Planning under Uncertainty

POMDPs: Partially Observable Markov Decision Processes Advanced AI

Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu, S. Young.

Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.

Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.

CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California

Conditional Random Fields

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.

Hidden Markov Models David Meir Blei November 1, 1999.

CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $

Computer vision: models, learning and inference

Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.

Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.

Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.

Cognitive User Interfaces: An Engineering Approach Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department.

1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Learning Automata based Approach to Model Dialogue Strategy in Spoken Dialogue System: A Performance Evaluation G.Kumaravelan Pondicherry University, Karaikal.

Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.

Privacy-Preserving Bayes-Adaptive MDPs CS548 Term Project Kanghoon Lee, AIPR Lab., KAIST CS548 Advanced Information Security Spring 2010.

Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.

Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.

Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.

CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California

 Ontology Induction (Chen et al., 2013 & 2014) Frame-semantic parsing on ASR results (Das et al., 2013) frame  slot candidate lexical unit  slot filler.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.

Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;

1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.

Evolvable dialogue systems

Partially Observable Markov Decision Process and RL

Online Multiscale Dynamic Topic Models

Thrust IC: Action Selection in Joint-Human-Robot Teams

Markov ó Kalman Filter Localization

Integrating Learning of Dialog Strategies and Semantic Parsing

Propagating Uncertainty In POMDP Value Iteration with Gaussian Process

Filtering and State Estimation: Basic Concepts

Reinforcement Learning Dealing with Partial Observability

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Dialogue Modelling Milica Gašić Dialogue Systems Group

Why are current methods poor?

Dialogue as a Partially Observable Markov Decision Process (POMDP) atat stst s t+1 rtrt otot o t+1 State is unobservable and depends on the previous state and action: P(s t+1 |s t, a t ) – the transition probability State depends on a noisy observation P(s t |o t ) -- the observation probability Action selection (policy) is based on the distribution over all states at every time step t – belief state b(s t )

How to track belief state?

Belief propagation Probabilities conditional on the observations Interested in the marginal probabilities p(x|D), D={D a,D b } D a D b x

Belief propagation D a D b x D c D d Split D b further into D c and D d

Belief propagation D a D c a c D b b

Belief propagation D a D b a b

How to track belief state? atat stst s t+1 rtrt otot o t+1

Belief state tracking Requires summation over every dialogue state!!! atat stst s t+1 rtrt otot o t+1 Requires summation over all possible states at every dialogue turn – intractable!!!

Challenges in POMDP dialogue modelling How to define the state space? How to tractably maintain the belief state? How to define transition and observation probabilities?

How to represent dialogue state? Needs to know what happened before – the dialogue history Markov property Needs to know what user wants – the user goal Task oriented dialogue Needs to know what user says – the user act Robust to errors

Dialogue state factorisation Decompose the sate into conditionally independent elements: user goal user action stst gtgt utut dtdt dialogue history atat rtrt otot o t+1 g t+1 u t+1 d t+1

Belief update gtgt utut dtdt atat rtrt otot o t+1 g t+1 u t+1 d t+1 Requires summation over all possible goals– intractable!!! Requires summation over all possible histories and user actions– intractable!!!

Dialogue models for real-world dialogue system Hidden Information State (HIS) system Bayesian Update of Dialogue State (BUDS) system

Hidden Information State system Real world dialogue system based on POMDP Takes an N-best input of user utterances Maintains a distribution over most probable dialogue states in real time

Hidden Information State system – dialogue acts inform ( pricerange = cheap, area = centre) dialogue act typesemantics slots and values Is there um maybe a cheap place in the centre of town please? inform request confirm … type=restaurant food=Chinese …

Hidden Information State system -- ontology typerestaurantareanorthsouthfoodChineseIndianhotelstarts

Hidden Information State system – belief update Only the user acts from the N-best Iist Dialogue histories take a small number of values Goals are grouped into partitions All probabilities are handcrafted

Dialogue history in the HIS system Dialogue history ideally represent everything that happened History states: system informed, user informed, user requested, system requested for each concept in the dialogue either 1 or 0 and defined by a finite state automaton

HIS partitions Represent group of (most probable) goals Dynamically built during the dialogue is set to a high value if g t+1 is in line with g t and a t, otherwise a small value

HIS partitions --example System: How may I help you? request(task) User: I’d like a restaurant in the centre. inform(entity=venue, type=restaurant, area=centre) entity ! venue entity venue type area !restaurant entity venue type area restaurant !central entity venue type area !restaurant central entity venue type area restaurant central entity=venue type=restaurant area=central

Pruning 23 entity ! venue entity venue type area !restaurant entity venue type area restaurant !central entity venue type area !restaurant central entity venue type area restaurant central entity=venue 0.9 type=restaurant 0.2 area=central 0.5

Hidden Information State systems Any limitations?

Bayesian Update of Dialogue State system Further decomposes the dialogue state Tractable belief state update Learning of the shape of distribution

Bayesian network model for dialogue gtgt utut dtdt atat rtrt otot o t+1 g t+1 u t+1 d t+1 g t food d t food u t food g t area d t area u t area g t+1 food d t+1 food u t+1 food g t+1 area d t+1 area u t+1 food

Belief tracking For each node x Start on one side, and keep getting p(x|D a ) Then start on the other ends and keep getting p(D b |x) To get a marginal simply multiply these

Bayesian network model for dialogue atat rtrt otot o t+1 g t food d t food u t food g t area d t area u t area g t+1 food d t+1 food u t+1 food g t+1 area d t+1 area u t+1 food θ

Training policy using different parameters Policy trained using reinforcement learning (explained in next lecture) Examined on different errors in the user input Average reward

Summary Essential ingredients to include in dialogue state Belief state maintaining Dialogue modelling for real world problems Learning of the shapes of probability distributions