“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus.

“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus Carnegie Mellon University dbohus@cs.cmu.edu Pittsburgh, PA 15213

2 problem spoken language interfaces lack robustness when faced with understanding errors  errors stem mostly from speech recognition  typical word error rates: 20-30%  significant negative impact on interactions

3 guarding against understanding errors  use confidence scores  machine learning approaches for detecting misunderstadings [Walker, Litman, San-Segundo, Wright, and others]  engage in confirmation actions  explicit confirmation did you say you wanted to fly to Seoul? yes → trust hypothesis no → delete hypothesis “other” → non-understanding  implicit confirmation traveling to Seoul … what day did you need to travel? rely on new values overwriting old values related work : data : user response analysis : proposed approach: experiments and results : conclusion

4 construct accurate beliefs by integrating information over multiple turns in a conversation today’s talk … S:Where would you like to go? U:Huntsville [SEOUL / 0.65] S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M / 0.60]

5 belief updating: problem statement S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} [THE TRAVELING TO BERLIN P_M / 0.60]  given  an initial belief B initial (C) over concept C  a system action SA  a user response R  construct an updated belief  B updated (C) ← f (B initial (C), SA, R)

6 outline  proposed approach  data  experiments and results  effect on dialog performance  conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion

7 belief updating: problem statement S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} [THE TRAVELING TO BERLIN P_M / 0.60]  given  an initial belief B initial (C) over concept C  a system action SA(C)  a user response R  construct an updated belief  B updated (C) ← f(B initial (C),SA(C),R) proposed approach: data: experiments and results : effect on dialog performance : conclusion

8 belief representation B updated (C) ← f(B initial (C), SA(C), R)  most accurate representation  probability distribution over the set of possible values  however  system will “hear” only a small number of conflicting values for a concept within a dialog session  in our data max = 3 (conflicting values heard) only in 6.9% of cases, more than 1 value heard proposed approach: data: experiments and results : effect on dialog performance : conclusion

9  compressed belief representation  k hypotheses + other  at each turn, the system retains the top m initial hypotheses and adds n new hypotheses from the input (m+n=k) belief representation B updated (C) ← f(B initial (C), SA(C), R) proposed approach: data: experiments and results : effect on dialog performance : conclusion

10  B(C) modeled as a multinomial variable  {h 1, h 2, … h k, other}  B(C) = where c h1 + c h2 + … + c hk + c other = 1  belief updating can be cast as multinomial regression problem: B updated (C) ← B initial (C) + SA(C) + R belief representation B updated (C) ← f(B initial (C), SA(C), R) proposed approach: data: experiments and results : effect on dialog performance : conclusion

11 request S:For when do you want the room? U:Friday [FRIDAY / 0.65] explicit confirmation S:Did you say you wanted a room for Friday? U:Yes [GUEST / 0.30] implicit confirmation S:a room for Friday … starting at what time? U:starting at ten a.m. [STARTING AT TEN A_M / 0.86] unplanned implicit confirmation S:I found 5 rooms available Friday from 10 until noon. Would you like a small or a large room? U:not Friday, Thursday [FRIDAY THURSDAY / 0.25] no action / unexpected update S:okay. I will complete the reservation. Please tell me your name or say ‘guest user’ if you are not a registered user. U:guest user [THIS TUESDAY / 0.55] system action B updated (C) ← f(B initial (C), SA(C), R) proposed approach: data: experiments and results : effect on dialog performance : conclusion

12 acoustic / prosodic acoustic and language scores, duration, pitch (min, max, mean, range, std.dev, min and max slope, plus normalized versions), voiced-to- unvoiced ratio, speech rate, initial pause, etc; lexical number of words, lexical terms highly correlated with corrections or acknowledgements (selected via mutual information computation). grammatical number of slots (new and repeated), parse fragmentation, parse gaps, etc; dialog dialog state, turn number, expectation match, new value for concept, timeout, barge-in, concept identity priors priors for concept values (manually constructed by a domain expert for 3 of 29 concepts: date, start_time, end_time; uniform assumed o/w) confusability empirically derived confusability scores B updated (C) ← f(B initial (C), SA(C), R) user response proposed approach: data: experiments and results : effect on dialog performance : conclusion

13 approach  problem  ← f(, SA(C), R)  approach: multinomial generalized linear model  regression model, multinomial independent variable  sample efficient  stepwise approach feature selection BIC to control over-fitting  one model for each system action ← f SA(C) (, R) B updated (C) ← f(B initial (C), SA(C), R) proposed approach: data: experiments and results : effect on dialog performance : conclusion

15 data  collected with RoomLine  a phone-based mixed-initiative spoken dialog system  conference room reservation  explicit and implicit confirmations  simple heuristic rules for belief updating  explicit confirm: yes / no  implicit confirm: new values overwrite old ones proposed approach: data: experiments and results : effect on dialog performance : conclusion

16 corpus  user study  46 participants (naïve users)  10 scenario-based interactions each  compensated per task success  corpus  449 sessions, 8848 user turns  orthographically transcribed  manually annotated misunderstandings corrections correct concept values proposed approach: data: experiments and results : effect on dialog performance : conclusion

18 baselines  initial baseline  accuracy of system beliefs before the update  heuristic baseline  accuracy of heuristic update rule used by the system  oracle baseline  accuracy if we knew exactly when the user corrects proposed approach: data: experiments and results : effect on dialog performance : conclusion

19 k=2 hypotheses + other  priors and confusability  initial confidence score  concept identity  barge-in  expectation match  repeated grammar slots Informative features proposed approach: data: experiments and results : effect on dialog performance : conclusion

21 a question remains … … does this really matter? what is the effect on global dialog performance? proposed approach: data: experiments and results : effect on dialog performance : conclusion

22 let’s run an experiment guinea pigs from Speech Lab for exp: $0 getting change from guys in the lab: $2/$3/$5 real subjects for the experiment: $25 picture with advisor of the VERY last exp at CMU: priceless!!!! [courtesy of Mohit Kumar]

23 a new user study …  implemented models in RavenClaw, performed a new user study  40 participants, first-time users  10 scenario-driven interactions each  non-native speakers of North-American English  improvements more likely at higher WER supported by empirical evidence  between-subjects; 2 gender-balanced groups  control: RoomLine using heuristic update rules  treatment: RoomLine using runtime models proposed approach: data: experiments and results : effect on dialog performance : conclusion

24 effect on task success proposed approach: data: experiments and results : effect on dialog performance : conclusion 73.6% 81.3% control treatment task success control treatment even though average user WER 21.9% 24.2%

25 effect on task success … a closer look proposed approach: data: experiments and results : effect on dialog performance : conclusion Task Success ← 2.09 - 0.05∙WER + 0.69∙Condition probability of task success word error rate 16% WER 30% WER 64% 78% p=0.001 78%

26 improvements at different WER proposed approach: data: experiments and results : effect on dialog performance : conclusion word-error-rate absolute Improvement in task success

27 effect on task duration (for successful tasks)  ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition  significant improvement, equivalent to 7.9% absolute reduction in WER proposed approach: data: experiments and results : effect on dialog performance : conclusion

29 summary  data-driven approach for constructing accurate system beliefs  integrate information across multiple turns  bridge together detection of misunderstandings and corrections  significantly outperforms current heuristics  significantly improves effectiveness and efficiency

30 other advantages  sample efficient  performs a local one-turn optimization  good local performance leads to good global performance  scalable  works independently on concepts  29 concepts, varying cardinalities  portable  decoupled from dialog task specification  doesn’t make strong assumptions about dialog management technology

31 thank you! questions …

32 user study  10 scenarios, fixed order  presented graphically (explained during briefing)  participants compensated per task success

“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus.

Similar presentations

Presentation on theme: "“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus.

Similar presentations

Presentation on theme: "“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus."— Presentation transcript:

Similar presentations

About project

Feedback