Download presentation
Presentation is loading. Please wait.
1
“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus Carnegie Mellon University dbohus@cs.cmu.edu Pittsburgh, PA 15213
2
2 problem spoken language interfaces lack robustness when faced with understanding errors errors stem mostly from speech recognition typical word error rates: 20-30% significant negative impact on interactions
3
3 guarding against understanding errors use confidence scores machine learning approaches for detecting misunderstadings [Walker, Litman, San-Segundo, Wright, and others] engage in confirmation actions explicit confirmation did you say you wanted to fly to Seoul? yes → trust hypothesis no → delete hypothesis “other” → non-understanding implicit confirmation traveling to Seoul … what day did you need to travel? rely on new values overwriting old values related work : data : user response analysis : proposed approach: experiments and results : conclusion
4
4 construct accurate beliefs by integrating information over multiple turns in a conversation today’s talk … S:Where would you like to go? U:Huntsville [SEOUL / 0.65] S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M / 0.60]
5
5 belief updating: problem statement S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} [THE TRAVELING TO BERLIN P_M / 0.60] given an initial belief B initial (C) over concept C a system action SA a user response R construct an updated belief B updated (C) ← f (B initial (C), SA, R)
6
6 outline proposed approach data experiments and results effect on dialog performance conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion
7
7 belief updating: problem statement S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} [THE TRAVELING TO BERLIN P_M / 0.60] given an initial belief B initial (C) over concept C a system action SA(C) a user response R construct an updated belief B updated (C) ← f(B initial (C),SA(C),R) proposed approach: data: experiments and results : effect on dialog performance : conclusion
8
8 belief representation B updated (C) ← f(B initial (C), SA(C), R) most accurate representation probability distribution over the set of possible values however system will “hear” only a small number of conflicting values for a concept within a dialog session in our data max = 3 (conflicting values heard) only in 6.9% of cases, more than 1 value heard proposed approach: data: experiments and results : effect on dialog performance : conclusion
9
9 compressed belief representation k hypotheses + other at each turn, the system retains the top m initial hypotheses and adds n new hypotheses from the input (m+n=k) belief representation B updated (C) ← f(B initial (C), SA(C), R) proposed approach: data: experiments and results : effect on dialog performance : conclusion
10
10 B(C) modeled as a multinomial variable {h 1, h 2, … h k, other} B(C) = where c h1 + c h2 + … + c hk + c other = 1 belief updating can be cast as multinomial regression problem: B updated (C) ← B initial (C) + SA(C) + R belief representation B updated (C) ← f(B initial (C), SA(C), R) proposed approach: data: experiments and results : effect on dialog performance : conclusion
11
11 request S:For when do you want the room? U:Friday [FRIDAY / 0.65] explicit confirmation S:Did you say you wanted a room for Friday? U:Yes [GUEST / 0.30] implicit confirmation S:a room for Friday … starting at what time? U:starting at ten a.m. [STARTING AT TEN A_M / 0.86] unplanned implicit confirmation S:I found 5 rooms available Friday from 10 until noon. Would you like a small or a large room? U:not Friday, Thursday [FRIDAY THURSDAY / 0.25] no action / unexpected update S:okay. I will complete the reservation. Please tell me your name or say ‘guest user’ if you are not a registered user. U:guest user [THIS TUESDAY / 0.55] system action B updated (C) ← f(B initial (C), SA(C), R) proposed approach: data: experiments and results : effect on dialog performance : conclusion
12
12 acoustic / prosodic acoustic and language scores, duration, pitch (min, max, mean, range, std.dev, min and max slope, plus normalized versions), voiced-to- unvoiced ratio, speech rate, initial pause, etc; lexical number of words, lexical terms highly correlated with corrections or acknowledgements (selected via mutual information computation). grammatical number of slots (new and repeated), parse fragmentation, parse gaps, etc; dialog dialog state, turn number, expectation match, new value for concept, timeout, barge-in, concept identity priors priors for concept values (manually constructed by a domain expert for 3 of 29 concepts: date, start_time, end_time; uniform assumed o/w) confusability empirically derived confusability scores B updated (C) ← f(B initial (C), SA(C), R) user response proposed approach: data: experiments and results : effect on dialog performance : conclusion
13
13 approach problem ← f(, SA(C), R) approach: multinomial generalized linear model regression model, multinomial independent variable sample efficient stepwise approach feature selection BIC to control over-fitting one model for each system action ← f SA(C) (, R) B updated (C) ← f(B initial (C), SA(C), R) proposed approach: data: experiments and results : effect on dialog performance : conclusion
14
14 outline proposed approach data experiments and results effect on dialog performance conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion
15
15 data collected with RoomLine a phone-based mixed-initiative spoken dialog system conference room reservation explicit and implicit confirmations simple heuristic rules for belief updating explicit confirm: yes / no implicit confirm: new values overwrite old ones proposed approach: data: experiments and results : effect on dialog performance : conclusion
16
16 corpus user study 46 participants (naïve users) 10 scenario-based interactions each compensated per task success corpus 449 sessions, 8848 user turns orthographically transcribed manually annotated misunderstandings corrections correct concept values proposed approach: data: experiments and results : effect on dialog performance : conclusion
17
17 outline proposed approach data experiments and results effect on dialog performance conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion
18
18 baselines initial baseline accuracy of system beliefs before the update heuristic baseline accuracy of heuristic update rule used by the system oracle baseline accuracy if we knew exactly when the user corrects proposed approach: data: experiments and results : effect on dialog performance : conclusion
19
19 k=2 hypotheses + other priors and confusability initial confidence score concept identity barge-in expectation match repeated grammar slots Informative features proposed approach: data: experiments and results : effect on dialog performance : conclusion
20
20 outline proposed approach data experiments and results effect on dialog performance conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion
21
21 a question remains … … does this really matter? what is the effect on global dialog performance? proposed approach: data: experiments and results : effect on dialog performance : conclusion
22
22 let’s run an experiment guinea pigs from Speech Lab for exp: $0 getting change from guys in the lab: $2/$3/$5 real subjects for the experiment: $25 picture with advisor of the VERY last exp at CMU: priceless!!!! [courtesy of Mohit Kumar]
23
23 a new user study … implemented models in RavenClaw, performed a new user study 40 participants, first-time users 10 scenario-driven interactions each non-native speakers of North-American English improvements more likely at higher WER supported by empirical evidence between-subjects; 2 gender-balanced groups control: RoomLine using heuristic update rules treatment: RoomLine using runtime models proposed approach: data: experiments and results : effect on dialog performance : conclusion
24
24 effect on task success proposed approach: data: experiments and results : effect on dialog performance : conclusion 73.6% 81.3% control treatment task success control treatment even though average user WER 21.9% 24.2%
25
25 effect on task success … a closer look proposed approach: data: experiments and results : effect on dialog performance : conclusion Task Success ← 2.09 - 0.05∙WER + 0.69∙Condition probability of task success word error rate 16% WER 30% WER 64% 78% p=0.001 78%
26
26 improvements at different WER proposed approach: data: experiments and results : effect on dialog performance : conclusion word-error-rate absolute Improvement in task success
27
27 effect on task duration (for successful tasks) ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition significant improvement, equivalent to 7.9% absolute reduction in WER proposed approach: data: experiments and results : effect on dialog performance : conclusion
28
28 outline proposed approach data experiments and results effect on dialog performance conclusion proposed approach: data: experiments and results : effect on dialog performance : conclusion
29
29 summary data-driven approach for constructing accurate system beliefs integrate information across multiple turns bridge together detection of misunderstandings and corrections significantly outperforms current heuristics significantly improves effectiveness and efficiency
30
30 other advantages sample efficient performs a local one-turn optimization good local performance leads to good global performance scalable works independently on concepts 29 concepts, varying cardinalities portable decoupled from dialog task specification doesn’t make strong assumptions about dialog management technology
31
31 thank you! questions …
32
32 user study 10 scenarios, fixed order presented graphically (explained during briefing) participants compensated per task success
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.