Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department.

Slides:

Advertisements

Similar presentations

Hidden Information State System A Statistical Spoken Dialogue System M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu and S. Young Cambridge.

Advertisements

Dialogue Modelling Milica Gašić Dialogue Systems Group.

Dialogue Policy Optimisation

Statistical Dialogue Modelling Milica Gašić Dialogue Systems Group.

On-line dialogue policy optimisation Milica Gašić Dialogue Systems Group.

David Rosen Goals  Overview of some of the big ideas in autonomous systems  Theme: Dynamical and stochastic systems lie at the intersection of mathematics.

Dynamic Bayesian Networks (DBNs)

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Reducing Drift in Parametric Motion Tracking

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

Reinforcement Learning & Apprenticeship Learning Chenyi Chen.

Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu, S. Young.

Pattern Recognition and Machine Learning

Lecture 5: Learning models using EM

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Presented by Zeehasham Rasheed

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Radial Basis Function Networks

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

Natural Language Understanding

Helsinki University of Technology Adaptive Informatics Research Centre Finland Variational Bayesian Approach for Nonlinear Identification and Control Matti.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.

Computer vision: models, learning and inference Chapter 19 Temporal models.

Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypotheses in real time Robust to speech recognition noise Semantic.

The changing face of face research Vicki Bruce School of Psychology Newcastle University.

Cognitive User Interfaces: An Engineering Approach Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department.

Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn.

Develop a fast semantic decoder Robust to speech recognition noise Trainable on different domains: Tourist information (TownInfo) Air travel information.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Learning Automata based Approach to Model Dialogue Strategy in Spoken Dialogue System: A Performance Evaluation G.Kumaravelan Pondicherry University, Karaikal.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.

Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

Cognitive Systems Foresight Language and Speech. Cognitive Systems Foresight Language and Speech How does the human system organise itself, as a neuro-biological.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.

November TE Odei Rey Orozko1 TE-MPE-PE new member presentation Odei Rey Orozko.

Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypothesis in real time Robust to speech recognition noise Trainable.

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

Evolvable dialogue systems

Machine Learning for Computer Security

Online Multiscale Dynamic Topic Models

Thrust IC: Action Selection in Joint-Human-Robot Teams

PSG College of Technology

Integrating Learning of Dialog Strategies and Semantic Parsing

Deep reinforcement learning for dialogue policy optimisation

Propagating Uncertainty In POMDP Value Iteration with Gaussian Process

Joelle Pineau: General info

LECTURE 15: REESTIMATION, EM AND MIXTURES

August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department Cambridge, UK Steve Young

2 Interspeech Plenary September 2010 © Steve Young Outline of Talk  A brief historical perspective  Cognitive User Interfaces  Statistical Dialogue Modelling  Scaling to the Real World  System Architecture  Some Examples and Results  Conclusions and future work.

3 Interspeech Plenary September 2010 © Steve Young Why Talk to Machines?  it should be an easy and efficient way of finding out information and controlling behaviour  sometimes it is the only way  hands-busy eg surgeon, driver, package handler, etc.  no internet and no call-centres e.g. areas of 3 rd world  very small devices  one day it might be fun c.f. Project Natal - Milo

4 Interspeech Plenary September 2010 © Steve Young VODIS - circa 1985 Natural language/mixed initiative Train-timetable Inquiry Service 150 word DTW connected speech recognition Frame- based Dialogue Manager Frame- based Dialogue Manager DecTalk Synthesiser DecTalk Synthesiser Recognition Grammars PDP11/45 8 x 8086 Processors 128k Mem/ 2x5Mb Disk Words Text Demo Logos Speech Recogniser Collaboration between BT, Logica and Cambridge U.

5 Interspeech Plenary September 2010 © Steve Young Some desirable properties of a Spoken Dialogue System  able to support reasoning and inference  interpret noisy inputs and resolve ambiguities in context  able to plan under uncertainty  clearly defined communicative goals  performance quantified as rewards  plans optimized to maximize rewards  able to adapt on-line  robust to speaker (accent, vocab, behaviour,..)  robust to environment (noise, location,..)  able to learn from experience  progressively optimize models and plans over time Cognitive User Interface Cognitive User Interface S. Young (2010). "Cognitive User Interfaces." Signal Processing Magazine 27(3)

6 Interspeech Plenary September 2010 © Steve Young Essential Ingredients of a Cognitive User Interface (CUI) Explicit representation of uncertainty using a probability model over dialogue states e.g. using Bayesian networks Inputs regarded as observations used to update the posterior state probabilities via inference Responses defined by plans which map internal states to actions The system’s design objectives defined by rewards associated with specific state/action pairs Plans optimized via reinforcement learning Model parameters estimated via supervised learning and/or optimized via reinforcement learning Partially Observable Markov Decision Process (POMDP) Partially Observable Markov Decision Process (POMDP)

7 Interspeech Plenary September 2010 © Steve Young A Framework for Statistical Dialogue Management Distribution Parameters λ Distribution Parameters λ Model Distribution of Dialogue States s t Model Distribution of Dialogue States s t Policy π(a t |b t,θ) Policy π(a t |b t,θ) Speech Understanding Speech Understanding b t = P(s t |o t-1,b t-1 ; λ ) Belief otot Policy Parameters θ Policy Parameters θ Observation Response Generation Response Generation Action a t User Reward Function r Reward Function r R = r(b t,a t ) t Σ b t,a t Reward

8 Interspeech Plenary September 2010 © Steve Young Belief Tracking aka Belief Monitoring Belief is updated following each new user input However, the state space is huge and the above equation is intractable for practical systems. So we approximate: Track just the N most likely states Hidden Information State System (HIS) Factorise the state space and ignore all but major conditional dependencies Graphical Model System (GMS aka BUDS) S. Young (2010). "The Hidden Information State Model" Computer Speech and Language 24(2) B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)

9 Interspeech Plenary September 2010 © Steve Young Dialogue State o type g type u type h type Goal User Act History Observation at time t User Behaviour Recognition/ Understanding Errors Tourist Information Domain type = bar,restaurant food = French, Chinese, none Memory o food g food u food h food Next Time Slice t+1 J. Williams (2007). ”POMDPs for Spoken Dialog Systems." Computer Speech and Language 21(2)

10 Interspeech Plenary September 2010 © Steve Young (ignoring history nodes for simplicity) Dialogue Model Parameters o type g type u type o foo d g food u food o type g type u type o foo d g food u food p(u|g) g  FrenchChineseNone French0.700 Chinese00.70 NoMention p(o|u) u  FrenchChineseNoMention French Chinese NoMention001.0 time ttime t+1

11 Interspeech Plenary September 2010 © Steve Young Belief Monitoring (Tracking) BR FC- BR FC- o type g type u type o foo d g food u food o type g type u type o foo d g food u food t=1t=2 inform(food=french) {0.9} confirm(food=french) affirm() {0.9}

12 Interspeech Plenary September 2010 © Steve Young Belief Monitoring (Tracking) BR FC- BR FC- o type g type u type o foo d g food u food o type g type u type o foo d g food u food t=1t=2 inform(type=bar, food=french) {0.6} inform(type=restaurant, food=french) {0.3} confirm(type=restaurant, food=french) affirm() {0.9}

13 Interspeech Plenary September 2010 © Steve Young Belief Monitoring (Tracking) BR FC- BR FC- o type g type u type o foo d g food u food o type g type u type o foo d g food u food t=1t=2 inform(type=bar) {0.4} select(type=bar, type=restaurant) inform(type=bar) {0.4}

14 Interspeech Plenary September 2010 © Steve Young Choosing the next action – the Policy g type g food BR FC- inform(type=bar) {0.4} select(type=bar, type=restaurant) type food Quantize All Possible Summary Actions: inform, select, confirm, etc Policy Vector Sample a = select Map

15 Interspeech Plenary September 2010 © Steve Young Policy Optimization Policy parameters chosen to maximize expected reward Natural gradient ascent works well Gradient is estimated by sampling dialogues and in practice Fisher Information Matrix does not need to be explicitly computed. Fisher Information Matrix This is the Natural Actor Critic Algorithm. J. Peters and S. Schaal (2008). "Natural Actor-Critic." Neurocomputing 71(7-9)

16 Interspeech Plenary September 2010 © Steve Young Dialogue Model Parameter Optimization Approximation of belief distribution via feature vectors prevents policy differentiation wrt Dialogue Model parameters. This is the Natural Belief Critic Algorithm. However a trick can be used. Assume that are drawn from a prior which is differentiable wrt. Then optimize reward wrt to and sample to get. It is also possible to do maximum likelihood model parameter estimation using Expectation Propagation. F. Jurcicek (2010). "Natural Belief-Critic" Interspeech 2010 B. Thomson (2010). "Parameter learning for POMDP spoken dialogue models. SLT 2010

17 Interspeech Plenary September 2010 © Steve Young Performance Comparison in Simulated TownInfo Domain Handcrafted Model and Handcrafted Policy Handcrafted Model and Trained Policy Trained Model and Trained Policy Handcrafted Policy and Trained Model Reward = 100 for success – 1 for each turn taken

18 Interspeech Plenary September 2010 © Steve Young Scaling up to Real World Problems  compact representation of dialogue state eg HIS, BUDS  mapping belief states into summary states via quantisation, feature vectors, etc  mapping actions in summary space back into full space Several of the key ideas have already been covered But inference itself is also a problem …

19 Interspeech Plenary September 2010 © Steve Young CamInfo Ontology Complex Dialogue State Many concepts Many values per concept multiple nodes per concept

20 Interspeech Plenary September 2010 © Steve Young Belief Propagation Times Network Branching Factor Time LBP with Grouping Standard LBP LBP with Grouping & Const Prob of Change B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)

21 Interspeech Plenary September 2010 © Steve Young Architecture of the Cambridge Statistical SDS Dialogue Manager HIS or BUDS Dialogue Manager HIS or BUDS p(v|y) dialog acts a dialog acts Speech Recognition Speech Recognition Semantic Decoder Semantic Decoder Speech Synthesiser Speech Synthesiser Message Generator Message Generator y speech p(w|y) words speech p(m|a) words p(x|a) Corpus Data Run-time mode

22 Interspeech Plenary September 2010 © Steve Young Architecture of the Cambridge Statistical SDS Dialogue Manager HIS or BUDS Dialogue Manager HIS or BUDS p(v|y) dialog acts a dialog acts User Simulator User Simulator Error Model Error Model Corpus Data Training mode

23 Interspeech Plenary September 2010 © Steve Young CMU Let’s Go Spoken Dialogue Challenge Organised by the Dialog Research Center, CMU See Telephone-based spoken dialog system to provide bus schedule information for the City of Pittsburgh, PA (USA). Based on existing system with real users. Two stage evaluation process 1.Control Test with recruited subjects given specific known tasks 2.Live Test with competing implementations switched according to a daily schedule Full results to be presented at a special session at SLT

24 Interspeech Plenary September 2010 © Steve Young All Qualifying Systems Let’s Go 2010 Control Test Results Word Error Rate (WER) Average Success = 64.8% Average WER = 42.4% System Z 89% Success 33% WER System X 65% Success 42% WER System Y 75% Success 34% WER Predicted Success Rate B. Thomson "Bayesian Update of State for the Let's Go Spoken Dialogue Challenge.” SLT 2010.

25 Interspeech Plenary September 2010 © Steve Young CamInfo Demo

26 Interspeech Plenary September 2010 © Steve Young Conclusions  End-end statistical dialogue systems can be built and are competitive  Core is a POMDP-based dialogue manager which provides an explicit representation of uncertainty with the following benefits o robust to recognition errors o objective measure of goodness via reward function o ability to optimize performance against objectives o reduced development costs – no hand-tuning, no complex design processes, easily ported to new applications o natural dialogue – say anything, any time  Still much to do o faster learning, off-policy learning, long term adaptation, dynamic ontologies, multi-modal input/output  Perhaps talking to machines is within reach ….

27 Interspeech Plenary September 2010 © Steve Young Credits EU FP7 Project: Computational Learning in Adaptive Systems for Spoken Conversation Spoken Dialogue Management using Partially Observable Markov Decision Processes Past and Present Members of the CUED Dialogue Systems Group Milica Gasic, Filip Jurcicek, Simon Keizer, Fabrice Lefevre, Francois Mairesse, Jorge Prombonas, Jost Schatzmann, Matt Stuttle, Blaise Thomson, Karl Weilhammer, Jason Williams, Hui Ye, Kai Yu