belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Alex Rudnicky,

Slides:

Advertisements

Similar presentations

( current & future work ) explicit confirmation implicit confirmation unplanned implicit confirmation request constructing accurate beliefs in spoken dialog.

Advertisements

Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.

Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

© 2002 – 2007 Versay Solutions, LLC. All rights reserved. Building Fault Tolerant Voice User Interfaces SpeechTEK 2007 Tuesday, August 21 Track B “Getting.

Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University (

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

SpeechTEK August 22, 2007 Better Recognition by manipulation of ASR results Generic concepts for post computation recognizer result components. Emmett.

An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.

constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.

Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.

U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.

Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.

Belief Updating in Spoken Dialog Systems Dialogs on Dialogs Reading Group June, 2005 Dan Bohus Carnegie Mellon University, January 2004.

Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004.

Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Scheduling with Uncertain Resources Reflective Agent with Distributed Adaptive Reasoning RADAR.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.

Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,

Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,

Online supervised learning of non-understanding recovery policies Dan Bohus Computer Science Department Carnegie.

1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.

Madeleine, a RavenClaw Exercise in the Medical Diagnosis Domain Dan Bohus, Alex Rudnicky MITRE Workshop on Dialog Management, Boston, October 2003.

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.

Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus

A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements.

misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department Carnegie Mellon.

1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

6/28/20151 Spoken Dialogue Systems: Human and Machine Julia Hirschberg CS 4706.

“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department

Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part I:Issues,Data Collection,Rejection Tuning Dan Bohus Sphinx Lunch.

A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department.

Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.

Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.

Chapter 13: Inference in Regression

Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.

How to Administer Constructive and Effective Feedback to Online Students Denielle R. Vazquez, M.S.Ed – 2014 Teaching and Learning.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Research Methods for Counselors COUN 597 University of Saint Joseph Class # 6 Copyright © 2015 by R. Halstead. All rights reserved.

User Study Evaluation Human-Computer Interaction.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.

Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,

16.0 Spoken Dialogues References: , Chapter 17 of Huang 2. “Conversational Interfaces: Advances and Challenges”, Proceedings of the IEEE,

ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.

1 Natural Language Processing Lecture Notes 14 Chapter 19.

Introduction to Dialogue Systems. User Input System Output ?

Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.

Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.

Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,

RADAR February 15, RADAR /Space-Time Learning.

1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.

Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Automated feedback in statistics education

Spoken Dialogue Systems

Integrating Learning of Dialog Strategies and Semantic Parsing

Spoken Dialogue Systems

Presentation transcript:

belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Alex Rudnicky, CMU Eric Horvitz & Tim Paek, MSR Antoine Raux

2/36 spoken dialog systems  use natural language processing technology  engage in a goal-oriented conversation  research community  information access, command-and-control  personal assistants, taskable agents, tutoring systems  industry  simpler automated phone systems

3/36 the problem … misunderstandings lead to interaction breakdowns

4/36 more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT same PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

5/36 understanding errors  stem from the speech recognition process  difficult operating conditions  typical word-error-rates  20-30%  up to 50% for non-natives  strong negative impact on interactions

6/36 pathways to increased robustness  gracefully handle errors through conversation  improve recognition  detect the problems  develop a set of recovery strategies  know how to choose between them (policy)

7/36 S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] guarding against misunderstandings / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 / 0.07 confidence scores

8/36 S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] guarding against misunderstandings arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 / 0.07 confirmation actions  reject  explicit confirmation Did you say Seoul?  implicit confirmation traveling to Seoul … What day did you need to travel?  accept confidence scores

9/36 S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] belief updating arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 / 0.07 arrival = ? f arrival = { … } departure = { … } confidence scores

10/36 S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] belief updating: problem statement / 0.35 arrival = {Seoul / 0.65} arrival = ? f  given  an initial belief B initial (C) over concept C  a system action SA(C)  a user response R  construct an updated belief  B updated (C) ← f(B initial (C), SA(C), R)

11/36 outline  related work  proposed approach  data  experiments and results  effects on global performance  conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion

12/36 S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] detecting misunderstandings and corrections  confidence annotation  word-level [Cox, Chase, Bansal, Ravinshankar, etc]  semantic confidence annotation [Walker, San-Segundo, Bohus, etc]  correction detection [Litman, Swerts, Hirschberg, Krahmer, Levow]  detect when the user corrects the system related work : proposed approach : data : experiments and results : global performance : conclusion Conf=0.35 arrival = {Seoul / 0.65} arrival = ? Corr=0.47 ?

13/36 current solutions for tracking beliefs  most systems only track single values  new values overwrite old values  use simple heuristic rules  explicit confirmation S: did you say you wanted to fly to Seoul? yes → trust hypothesis no → delete hypothesis “other” → non-understanding  implicit confirmation S: traveling to Seoul … what day did you need to travel? rely on new values overwriting old values related work : proposed approach : data : experiments and results : global performance : conclusion

14/36 outline  related work  proposed approach  data  experiments and results  effects on global performance  conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion

15/36  given  an initial belief B initial (C) over concept C  a system action SA(C)  a user response R  construct an updated belief  B updated (C) ← f(B initial (C), SA(C), R) S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] belief updating: problem statement / 0.35 arrival = {Seoul / 0.65} arrival = ? f related work : proposed approach : data : experiments and results : global performance : conclusion

16/36  most accurate representation  probability distribution over the set of possible values belief representation B updated (C) ← f(B initial (C), SA(C), R)  however  system “hears” only a small number of conflicting values for a concept throughout a session max = 3 conflicting values heard only in 7% of cases, more than 1 value heard ABERDEEN, TX ABILENE, TX ALBANY, NY ALBUQUERQUE, NM ALLENTOWN, PAALEXANDRIA, LA ALLAKAKET, AK ALLIANCE, NE ALPENA, MI ALPINE, TX YUMA, AZ departure related work : proposed approach : data : experiments and results : global performance : conclusion

17/36  compressed belief representation  k hypotheses + other  dynamically add and drop hypotheses  remember m hypotheses, add n new ones (m+n=k) belief representation departure_city [k=3, m=2, n=1] Austin Boston Houstonother S: Did you say you were flying from Austin? U: [NO ASPEN] Aspen S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] Ø BostonAspenother BostonAustinother B updated (C) ← f(B initial (C), SA(C), R)  B … (C) is a multinomial variable of degree k+1 related work : proposed approach : data : experiments and results : global performance : conclusion

18/36 request S:When would you like to take this flight? U:Friday [FRIDAY] / 0.65 explicit confirmation S:Did you say you wanted to fly this Friday? U:Yes [GUEST] / 0.30 implicit confirmation S:A flight for Friday … at what time? U:At ten a.m. [AT TEN A_M] / 0.86 no action / unexpected update S:okay. I will complete the reservation. Please tell me your name or say ‘guest user’ if you are not a registered user. U:guest user [THIS TUESDAY] / 0.55 system action B updated (C) ← f(B initial (C), SA(C), R) related work : proposed approach : data : experiments and results : global performance : conclusion

19/36 acoustic / prosodic acoustic and language scores, duration, pitch information, voiced-to-unvoiced ratio, speech rate, initial pause lexical number of words, presence of words highly correlated with corrections or acknowledgements grammatical number of slots (new and repeated), goodness-of- parse scores dialog dialog state, turn number, expectation match, timeout, barge-in, concept identity priors priors for concept values confusability how confusable concept values are user response B updated (C) ← f(B initial (C), SA(C), R) related work : proposed approach : data : experiments and results : global performance : conclusion

20/36 approach  multinomial regression problem  multinomial generalized linear model  sample efficient  stepwise approach feature selection BIC to control over-fitting  one separate model for each system action B updated (C) ← f SA(C) (B initial (C), R) B updated (C) ← f(B initial (C), SA(C), R) related work : proposed approach : data : experiments and results : global performance : conclusion

21/36 outline  related work  proposed approach  data  experiments and results  effects on global performance  conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion

22/36 data  collected with RoomLine  a phone-based mixed-initiative spoken dialog system  conference room reservation  explicit and implicit confirmations  simple heuristic rules for belief updating  explicit confirm: yes / no  implicit confirm: new values overwrite old ones related work : proposed approach : data : experiments and results : global performance : conclusion

23/36 corpus  user study  46 participants (first-time users)  10 scenario-based interactions each  corpus  449 sessions, 8848 user turns  orthographically transcribed  manually annotated misunderstandings corrections correct concept values related work : proposed approach : data : experiments and results : global performance : conclusion

24/36 outline  related work  proposed approach  data  experiments and results  effects on global performance  conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion

25/36 models  k=2 + other(m=1, n=1)  k=3 + other(m=2, n=1)  k=4 + other(m=3, n=1)  full model  all features  basic model  all features except priors and confusability  runtime model  all features available at runtime related work : proposed approach : data : experiments and results : global performance : conclusion

26/36 baselines  initial baseline  accuracy of system beliefs before the update  heuristic baseline  accuracy of heuristic update rule used by the system  correction baseline  accuracy if we knew exactly when the user corrects the system related work : proposed approach : data : experiments and results : global performance : conclusion

27/36 results for k=2 hyps + other % 20% 10% 0% ihBMFMRMc initial baseline (i) heuristic baseline (h) basic model (BM) full model (FM) runtime model (RM) correction baseline (c) explicit confirm % 20% 10% 0% ihBMFMRMc implicit confirm % 8% 4% 0% ihBMFMRM request % 30% 15% 0% ihBMFMRM other related work : proposed approach : data : experiments and results : global performance : conclusion

28/36 a question remains … … does this really matter? related work : proposed approach : data : experiments and results : global performance : conclusion

29/36 outline  related work  proposed approach  data  experiments and results  effects on global performance  conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion

30/36 a new user study …  implemented models in RavenClaw  40 participants, first-time, non-native users improvements more likely at high word-error-rates  10 scenario-driven interactions each  between-subjects; 2 gender-balanced groups  control: RoomLine using heuristic update rules  treatment: RoomLine using runtime models related work : proposed approach : data : experiments and results : global performance : conclusion

31/36 effect on task success logit(TaskSuccess) ← ∙WER ∙Condition probability of task success 16% word error rate p= %40%60%80%100%0% word error rate 0% 20% 40% 60% 80% 100% 78% 30% word error rate 78% 64% treatment control  logistic ANOVA on task success related work : proposed approach : data : experiments and results : global performance : conclusion

32/36 how about efficiency?  ANOVA on task duration for successful tasks Duration ← ∙WER ∙Condition  significant improvement  equivalent to 7.9% absolute reduction in word-error p= related work : proposed approach : data : experiments and results : global performance : conclusion

33/36 outline  related work  proposed approach  data  experiments and results  effects on global performance  conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion

34/36 U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago summary arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.65 arrival = ? f arrival = { … }departure = { … }  approach for constructing accurate beliefs  integrate information across multiple turns  large gains in task success and efficiency related work : proposed approach : data : experiments and results : global performance : conclusion

35/36 other advantages  learns from data  tuned to the domain in which it operates  sample efficient / scalable  performs a local one-turn optimization  works independently on concepts  portable  decoupled from dialog task specification  no strong assumptions about dialog management related work : proposed approach : data : experiments and results : global performance : conclusion

36/36 future work  integrate information from n-best list  integrate other high-level knowledge  domain-specific constraints  inter-concept dependencies  unsupervised / implicit learning  domain-specificity related work : proposed approach : data : experiments and results : global performance : conclusion

37/36 thank you! questions …

38/36 improvements at different WER word-error-rate absolute improvement in task success

39/36 user study  10 scenarios, fixed order  presented graphically (explained during briefing)  participants compensated per task success

40/36 informative features  priors and confusability  initial confidence scores  concept identity  barge-in  expectation match  repeated grammar slots