A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements.

Slides:

Advertisements

Similar presentations

( current & future work ) explicit confirmation implicit confirmation unplanned implicit confirmation request constructing accurate beliefs in spoken dialog.

Advertisements

Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

 Graduate symposium deadline Friday  CSBS student conference April 25 (deadline April 10)  Thesis defenses  Outline due on Friday.

© 2002 – 2007 Versay Solutions, LLC. All rights reserved. Building Fault Tolerant Voice User Interfaces SpeechTEK 2007 Tuesday, August 21 Track B “Getting.

Calculating Time Travel

Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University (

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.

constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.

Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.

U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.

Belief Updating in Spoken Dialog Systems Dialogs on Dialogs Reading Group June, 2005 Dan Bohus Carnegie Mellon University, January 2004.

Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.

Scheduling with Uncertain Resources Reflective Agent with Distributed Adaptive Reasoning RADAR.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.

Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,

Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,

Online supervised learning of non-understanding recovery policies Dan Bohus Computer Science Department Carnegie.

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.

Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus

misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department Carnegie Mellon.

belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Alex Rudnicky,

“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department

Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source.

MUSCLE Multimodal e-team related activity Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Prof. Alex Potamianos Technical.

Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part I:Issues,Data Collection,Rejection Tuning Dan Bohus Sphinx Lunch.

A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department.

Today Concepts underlying inferential statistics

Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.

Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.

Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,

1 Natural Language Processing Lecture Notes 14 Chapter 19.

Introduction to Dialogue Systems. User Input System Output ?

ANOVA: Analysis of Variance.

EDCI 696 Dr. D. Brown Presented by: Kim Bassa. Targeted Topics Analysis of dependent variables and different types of data Selecting the appropriate statistic.

Modeling and Simulation of Survey Collection Using Paradata Presented by: Kristen Couture Co-authored by: Yves Bélanger Elisabeth Neusy.

Author: Alex Groce, Daniel Kroening, and Flavio Lerda Computer Science Department, Carnegie Mellon University Pittsburgh, PA Source: R. Alur and.

Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.

Finishing up: Statistics & Developmental designs Psych 231: Research Methods in Psychology.

Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

Finding Optimal Solutions to Cooperative Pathfinding Problems Trevor Standley Computer Science Department University of California, Los Angeles

Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,

1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.

1 Tom Edgar’s Contribution to Model Reduction as an introduction to Global Sensitivity Analysis Procedure Accounting for Effect of Available Experimental.

The Value of USAP in Software Architecture Design Presentation by: David Grizzanti.

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Spoken Dialogue Systems

Detecting Prosody Improvement in Oral Rereading

Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.

Spoken Dialogue Systems

INTEGRATED LEARNING CENTER

Voice Separation: A 15-minute Introduction

Presentation transcript:

a “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Tim Paek Eric Horvitz Microsoft Research

2/25 motivation spoken language interfaces are still very brittle [Parade, Sunday, March 26]

3/25 S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 confidence score misunderstandings Chicago  Huntsville  no no I’m traveling to Birmingham  the tenth of August  my destination is Birmingham  arrival = {Seoul / 0.65}

4/25 / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 confidence score S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… misunderstandings arrival = {Seoul / 0.65} f arrival = ? arrival = { … } departure = { … }

5/25 belief updating: problem statement S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] arrival = {Seoul / 0.65} f arrival = ?  given  an initial belief B initial (C) over concept C  a system action SA(C)  a user response R  construct an updated belief  B updated (C) ← f(B initial (C), SA(C), R)

6/25 outline  introduction  current solutions  approach  experimental results  effects on global performance  conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

7/25 current solutions S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… what day did you need to travel? U: [THE TRAVELING to berlin P_M] / 0.65 / 0.35 confidence scores / detecting misunderstandings [Cox, Chase, Bansal, Hazen, Ravishankar, Walker, San-Segundo, Bohus] / 0.72 detecting corrections [Litman, Swerts, Hirschberg, Krahmer, Levow] arrival = {Seoul / 0.65} f arrival = ?  track single values  use simple heuristic belief updating rules  explicit confirmations yes / no  implicit confirmations new values overwrite old values intro : current solutions : approach : experimental results : global performance : conclusion

8/25 outline  introduction  current solutions  approach  experimental results  effects on global performance  conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

9/25  given  an initial belief B initial (C) over concept C  a system action SA(C)  a user response R  construct an updated belief  B updated (C) ← f(B initial (C), SA(C), R) S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] belief updating: problem statement / 0.35 arrival = {Seoul / 0.65} arrival = ? f intro : current solutions : approach : experimental results : global performance : conclusion

10/25  probability distribution over the set of possible values belief representation B updated (C) ← f(B initial (C), SA(C), R)  however  system “hears” only a small number of conflicting values for a concept throughout a session max = 3 conflicting values heard ABERDEEN, TX ABILENE, TX ALBANY, NY ALBUQUERQUE, NM ALLENTOWN, PAALEXANDRIA, LA ALLAKAKET, AK ALLIANCE, NE ALPENA, MI ALPINE, TX YUMA, AZ departure intro : current solutions : approach : experimental results : global performance : conclusion

11/25  compressed belief representation  k hypotheses + other  dynamically add and drop hypotheses  remember m hypotheses, add n new ones (m+n=k) belief representation departure_city [k=3, m=2, n=1] Austin Boston Houstonother S: Did you say you were flying from Austin? U: [NO ASPEN] Aspen S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] Ø BostonAspenother BostonAustinother B updated (C) ← f(B initial (C), SA(C), R)  B … (C) is a multinomial variable of degree k+1 intro : current solutions : approach : experimental results : global performance : conclusion

12/25 request S:When would you like to take this flight? U:Friday [FRIDAY] / 0.65 explicit confirmation S:Did you say you wanted to fly this Friday? U:Yes [GUEST] / 0.30 implicit confirmation S:A flight for Friday … at what time? U:At ten a.m. [AT TEN A_M] / 0.86 no action / unexpected update S:okay. I will complete the reservation. Please tell me your name or say ‘guest user’ if you are not a registered user. U:guest user [THIS TUESDAY] / 0.55 system action B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion

13/25 acoustic / prosodic acoustic and language scores, duration, pitch information, voiced-to-unvoiced ratio, speech rate, initial pause lexical number of words, presence of words highly correlated with corrections or acknowledgements grammatical number of slots (new and repeated), goodness-of- parse scores dialog dialog state, turn number, expectation match, timeout, barge-in, concept identity priors priors for concept values confusability how confusable concept values are user response B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion

14/25 approach  multinomial regression problem  multinomial generalized linear model  sample efficient  stepwise approach  feature selection  one separate model for each system action  B updated (C) ← f SA(C) (B initial (C), R) B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion

15/25 outline  introduction  current solutions  approach  experimental results  effects on global performance  conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

16/25 data  RoomLine  conference room reservations  explicit and implicit confirmations  user study  46 participants  10 scenario-based interactions each  corpus  449 sessions, 8848 user turns  transcribed & annotated misunderstandings, corrections, correct concept values intro : current solutions : approach : experimental results : global performance : conclusion

17/25 model performance Model (M) [k=2, all features] initial baseline (i) [error before update] heuristic baseline (h) [error after heuristic update] correction baseline (c) [error if we had perfect correction detection] % 20% 10% 0% ihMc explicit confirm c % 20% 10% 0% ihM implicit confirm % 8% 4% 0% ihM request % 30% 15% 0% ihM no action intro : current solutions : approach : experimental results : global performance : conclusion

18/25 outline  introduction  current solutions  approach  experimental results  effects on global performance  conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

19/25 a new user study …  implemented models in the system  2 nd, between-subjects experiment  control: using heuristic update rules  treatment: using belief updating models  40 participants, non-native users improvements more likely at high word-error-rates intro : current solutions : approach : experimental results : global performance : conclusion

20/25 effect on task success logit(TaskSuccess) ← ∙WER ∙Condition probability of task success 16% word error rate p= %40%60%80%100%0% word error rate 0% 20% 40% 60% 80% 100% 78% 30% word error rate 78% 64% treatment control  logistic ANOVA on task success intro : current solutions : approach : experimental results : global performance : conclusion

21/25 how about efficiency?  ANOVA on task duration for successful tasks Duration ← ∙WER ∙Condition  significant improvement  equivalent to 7.9% absolute reduction in word-error p= intro : current solutions : approach : experimental results : global performance : conclusion

22/25 outline  introduction  current solutions  approach  experimental results  effects on global performance  conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

23/25 U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago summary arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.65 arrival = ? f arrival = { … }departure = { … }  approach for constructing accurate beliefs  integrate information across multiple turns  significant gains in task success and efficiency intro : current solutions : approach : experimental results : global performance : conclusion

24/25 other advantages  learns from data  tuned to the domain in which it operates  sample efficient / scalable  local one-turn optimization, concepts are independent  RoomLine operates with 29 concepts cardinality: 2  several hundreds  portable  decoupled from dialog task specification  no assumptions about dialog management intro : current solutions : approach : experimental results : global performance : conclusion

25/25 future work  integrate information from n-best list  integrate other high-level knowledge  domain-specific constraints  inter-concept dependencies  investigate technique in other domains intro : current solutions : approach : experimental results : global performance : conclusion

26/25 thank you! questions …

27/25 improvements at different WER word-error-rate absolute improvement in task success

28/25 user study  10 scenarios, fixed order  presented graphically (explained during briefing)  participants compensated per task success

29/25 informative features  priors and confusability  initial confidence scores  concept identity  barge-in  expectation match  repeated grammar slots

30/25 Models (k=2, runtime features) # The model for the explicit confirm action new_1 other LR_MODEL(EC) k = answer_type[YES] = answer_type[NO] = answer_type[OTHER] = concept_id(equip) = i_th_confusability = ih_diff_lexical_one_word = lexw1[SMALL] = response_new_hyps_in_selh = END

31/25 Models (k=2, runtime features) # The model for the implicit confirm action new_1 other LR_MODEL(IC) mark_confirm = mark_disconfirm = i_th_conf = i_th_confusability = k = lex[THREE] = response_new_hyps_in_selh = turn_number = END

32/25 Models (k=2, runtime features) # The model for the request action new_1 other LR_MODEL(REQ) k = barge_in = concept_id(date)= concept_id(user_name) = dialog_state[RequestSpecificTimes] = ih_diff_lexical = initial_num_hyps_>_0 = total_num_parses = ur_selh_new_1_conf = ur_selh_new_1_confusability = ur_selh_new_1_prior = ur_selh_new_1_prior_>_1 = END