Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department.

Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department Cambridge, UK Steve Young

2 Interspeech Plenary September 2010 © Steve Young Outline of Talk  A brief historical perspective  Cognitive User Interfaces  Statistical Dialogue Modelling  Scaling to the Real World  System Architecture  Some Examples and Results  Conclusions and future work.

3 Interspeech Plenary September 2010 © Steve Young Why Talk to Machines?  it should be an easy and efficient way of finding out information and controlling behaviour  sometimes it is the only way  hands-busy eg surgeon, driver, package handler, etc.  no internet and no call-centres e.g. areas of 3 rd world  very small devices  one day it might be fun c.f. Project Natal - Milo

4 Interspeech Plenary September 2010 © Steve Young VODIS - circa 1985 Natural language/mixed initiative Train-timetable Inquiry Service 150 word DTW connected speech recognition Frame- based Dialogue Manager Frame- based Dialogue Manager DecTalk Synthesiser DecTalk Synthesiser Recognition Grammars PDP11/45 8 x 8086 Processors 128k Mem/ 2x5Mb Disk Words Text Demo Logos Speech Recogniser Collaboration between BT, Logica and Cambridge U.

5 Interspeech Plenary September 2010 © Steve Young Some desirable properties of a Spoken Dialogue System  able to support reasoning and inference  interpret noisy inputs and resolve ambiguities in context  able to plan under uncertainty  clearly defined communicative goals  performance quantified as rewards  plans optimized to maximize rewards  able to adapt on-line  robust to speaker (accent, vocab, behaviour,..)  robust to environment (noise, location,..)  able to learn from experience  progressively optimize models and plans over time Cognitive User Interface Cognitive User Interface S. Young (2010). "Cognitive User Interfaces." Signal Processing Magazine 27(3)

6 Interspeech Plenary September 2010 © Steve Young Essential Ingredients of a Cognitive User Interface (CUI) Explicit representation of uncertainty using a probability model over dialogue states e.g. using Bayesian networks Inputs regarded as observations used to update the posterior state probabilities via inference Responses defined by plans which map internal states to actions The system’s design objectives defined by rewards associated with specific state/action pairs Plans optimized via reinforcement learning Model parameters estimated via supervised learning and/or optimized via reinforcement learning Partially Observable Markov Decision Process (POMDP) Partially Observable Markov Decision Process (POMDP)

7 Interspeech Plenary September 2010 © Steve Young A Framework for Statistical Dialogue Management Distribution Parameters λ Distribution Parameters λ Model Distribution of Dialogue States s t Model Distribution of Dialogue States s t Policy π(a t |b t,θ) Policy π(a t |b t,θ) Speech Understanding Speech Understanding b t = P(s t |o t-1,b t-1 ; λ ) Belief otot Policy Parameters θ Policy Parameters θ Observation Response Generation Response Generation Action a t User Reward Function r Reward Function r R = r(b t,a t ) t Σ b t,a t Reward

8 Interspeech Plenary September 2010 © Steve Young Belief Tracking aka Belief Monitoring Belief is updated following each new user input However, the state space is huge and the above equation is intractable for practical systems. So we approximate: Track just the N most likely states Hidden Information State System (HIS) Factorise the state space and ignore all but major conditional dependencies Graphical Model System (GMS aka BUDS) S. Young (2010). "The Hidden Information State Model" Computer Speech and Language 24(2) B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)

9 Interspeech Plenary September 2010 © Steve Young Dialogue State o type g type u type h type Goal User Act History Observation at time t User Behaviour Recognition/ Understanding Errors Tourist Information Domain type = bar,restaurant food = French, Chinese, none Memory o food g food u food h food Next Time Slice t+1 J. Williams (2007). ”POMDPs for Spoken Dialog Systems." Computer Speech and Language 21(2)

10 Interspeech Plenary September 2010 © Steve Young (ignoring history nodes for simplicity) Dialogue Model Parameters o type g type u type o foo d g food u food o type g type u type o foo d g food u food p(u|g) g  FrenchChineseNone French0.700 Chinese00.70 NoMention0.3 1.0 p(o|u) u  FrenchChineseNoMention French0.80.20 Chinese0.20.80 NoMention001.0 time ttime t+1

11 Interspeech Plenary September 2010 © Steve Young Belief Monitoring (Tracking) BR FC- BR FC- o type g type u type o foo d g food u food o type g type u type o foo d g food u food t=1t=2 inform(food=french) {0.9} confirm(food=french) affirm() {0.9}

12 Interspeech Plenary September 2010 © Steve Young Belief Monitoring (Tracking) BR FC- BR FC- o type g type u type o foo d g food u food o type g type u type o foo d g food u food t=1t=2 inform(type=bar, food=french) {0.6} inform(type=restaurant, food=french) {0.3} confirm(type=restaurant, food=french) affirm() {0.9}

13 Interspeech Plenary September 2010 © Steve Young Belief Monitoring (Tracking) BR FC- BR FC- o type g type u type o foo d g food u food o type g type u type o foo d g food u food t=1t=2 inform(type=bar) {0.4} select(type=bar, type=restaurant) inform(type=bar) {0.4}

14 Interspeech Plenary September 2010 © Steve Young Choosing the next action – the Policy g type g food BR FC- inform(type=bar) {0.4} select(type=bar, type=restaurant) 00010100 type food Quantize 001010001010 001010001010 001010001010 All Possible Summary Actions: inform, select, confirm, etc 000000000000 000000000000 000000000000 000000000000 000000000000 000000000000 000000000000 000000000000 000000000000 Policy Vector Sample a = select Map

15 Interspeech Plenary September 2010 © Steve Young Policy Optimization Policy parameters chosen to maximize expected reward Natural gradient ascent works well Gradient is estimated by sampling dialogues and in practice Fisher Information Matrix does not need to be explicitly computed. Fisher Information Matrix This is the Natural Actor Critic Algorithm. J. Peters and S. Schaal (2008). "Natural Actor-Critic." Neurocomputing 71(7-9)

16 Interspeech Plenary September 2010 © Steve Young Dialogue Model Parameter Optimization Approximation of belief distribution via feature vectors prevents policy differentiation wrt Dialogue Model parameters. This is the Natural Belief Critic Algorithm. However a trick can be used. Assume that are drawn from a prior which is differentiable wrt. Then optimize reward wrt to and sample to get. It is also possible to do maximum likelihood model parameter estimation using Expectation Propagation. F. Jurcicek (2010). "Natural Belief-Critic" Interspeech 2010 B. Thomson (2010). "Parameter learning for POMDP spoken dialogue models. SLT 2010

17 Interspeech Plenary September 2010 © Steve Young Performance Comparison in Simulated TownInfo Domain Handcrafted Model and Handcrafted Policy Handcrafted Model and Trained Policy Trained Model and Trained Policy Handcrafted Policy and Trained Model Reward = 100 for success – 1 for each turn taken

18 Interspeech Plenary September 2010 © Steve Young Scaling up to Real World Problems  compact representation of dialogue state eg HIS, BUDS  mapping belief states into summary states via quantisation, feature vectors, etc  mapping actions in summary space back into full space Several of the key ideas have already been covered But inference itself is also a problem …

19 Interspeech Plenary September 2010 © Steve Young CamInfo Ontology Complex Dialogue State Many concepts Many values per concept multiple nodes per concept

20 Interspeech Plenary September 2010 © Steve Young Belief Propagation Times Network Branching Factor Time LBP with Grouping Standard LBP LBP with Grouping & Const Prob of Change B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)

21 Interspeech Plenary September 2010 © Steve Young Architecture of the Cambridge Statistical SDS Dialogue Manager HIS or BUDS Dialogue Manager HIS or BUDS p(v|y) dialog acts a dialog acts Speech Recognition Speech Recognition Semantic Decoder Semantic Decoder Speech Synthesiser Speech Synthesiser Message Generator Message Generator y speech p(w|y) words speech p(m|a) words p(x|a) Corpus Data Run-time mode

22 Interspeech Plenary September 2010 © Steve Young Architecture of the Cambridge Statistical SDS Dialogue Manager HIS or BUDS Dialogue Manager HIS or BUDS p(v|y) dialog acts a dialog acts User Simulator User Simulator Error Model Error Model Corpus Data Training mode

23 Interspeech Plenary September 2010 © Steve Young CMU Let’s Go Spoken Dialogue Challenge Organised by the Dialog Research Center, CMU See http://www.dialrc.org/sdc/ Telephone-based spoken dialog system to provide bus schedule information for the City of Pittsburgh, PA (USA). Based on existing system with real users. Two stage evaluation process 1.Control Test with recruited subjects given specific known tasks 2.Live Test with competing implementations switched according to a daily schedule Full results to be presented at a special session at SLT

24 Interspeech Plenary September 2010 © Steve Young All Qualifying Systems Let’s Go 2010 Control Test Results Word Error Rate (WER) Average Success = 64.8% Average WER = 42.4% System Z 89% Success 33% WER System X 65% Success 42% WER System Y 75% Success 34% WER Predicted Success Rate B. Thomson "Bayesian Update of State for the Let's Go Spoken Dialogue Challenge.” SLT 2010.

26 Interspeech Plenary September 2010 © Steve Young Conclusions  End-end statistical dialogue systems can be built and are competitive  Core is a POMDP-based dialogue manager which provides an explicit representation of uncertainty with the following benefits o robust to recognition errors o objective measure of goodness via reward function o ability to optimize performance against objectives o reduced development costs – no hand-tuning, no complex design processes, easily ported to new applications o natural dialogue – say anything, any time  Still much to do o faster learning, off-policy learning, long term adaptation, dynamic ontologies, multi-modal input/output  Perhaps talking to machines is within reach ….

27 Interspeech Plenary September 2010 © Steve Young Credits EU FP7 Project: Computational Learning in Adaptive Systems for Spoken Conversation Spoken Dialogue Management using Partially Observable Markov Decision Processes Past and Present Members of the CUED Dialogue Systems Group Milica Gasic, Filip Jurcicek, Simon Keizer, Fabrice Lefevre, Francois Mairesse, Jorge Prombonas, Jost Schatzmann, Matt Stuttle, Blaise Thomson, Karl Weilhammer, Jason Williams, Hui Ye, Kai Yu

Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department.

Similar presentations

Presentation on theme: "Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department.

Similar presentations

Presentation on theme: "Still Talking to Machines (Cognitively Speaking) Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department."— Presentation transcript:

Similar presentations

About project

Feedback