Download presentation
Presentation is loading. Please wait.
2
belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Alex Rudnicky, CMU Eric Horvitz & Tim Paek, MSR Antoine Raux
3
2/36 spoken dialog systems use natural language processing technology engage in a goal-oriented conversation research community information access, command-and-control personal assistants, taskable agents, tutoring systems industry simpler automated phone systems
4
3/36 the problem … misunderstandings lead to interaction breakdowns
5
4/36 more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT same PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………
6
5/36 understanding errors stem from the speech recognition process difficult operating conditions typical word-error-rates 20-30% up to 50% for non-natives strong negative impact on interactions
7
6/36 pathways to increased robustness gracefully handle errors through conversation improve recognition detect the problems develop a set of recovery strategies know how to choose between them (policy)
8
7/36 S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] guarding against misunderstandings / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 / 0.07 confidence scores
9
8/36 S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] guarding against misunderstandings arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 / 0.07 confirmation actions reject explicit confirmation Did you say Seoul? implicit confirmation traveling to Seoul … What day did you need to travel? accept confidence scores
10
9/36 S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] belief updating arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 / 0.07 arrival = ? f arrival = { … } departure = { … } confidence scores
11
10/36 S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] belief updating: problem statement / 0.35 arrival = {Seoul / 0.65} arrival = ? f given an initial belief B initial (C) over concept C a system action SA(C) a user response R construct an updated belief B updated (C) ← f(B initial (C), SA(C), R)
12
11/36 outline related work proposed approach data experiments and results effects on global performance conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion
13
12/36 S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] detecting misunderstandings and corrections confidence annotation word-level [Cox, Chase, Bansal, Ravinshankar, etc] semantic confidence annotation [Walker, San-Segundo, Bohus, etc] correction detection [Litman, Swerts, Hirschberg, Krahmer, Levow] detect when the user corrects the system related work : proposed approach : data : experiments and results : global performance : conclusion Conf=0.35 arrival = {Seoul / 0.65} arrival = ? Corr=0.47 ?
14
13/36 current solutions for tracking beliefs most systems only track single values new values overwrite old values use simple heuristic rules explicit confirmation S: did you say you wanted to fly to Seoul? yes → trust hypothesis no → delete hypothesis “other” → non-understanding implicit confirmation S: traveling to Seoul … what day did you need to travel? rely on new values overwriting old values related work : proposed approach : data : experiments and results : global performance : conclusion
15
14/36 outline related work proposed approach data experiments and results effects on global performance conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion
16
15/36 given an initial belief B initial (C) over concept C a system action SA(C) a user response R construct an updated belief B updated (C) ← f(B initial (C), SA(C), R) S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] belief updating: problem statement / 0.35 arrival = {Seoul / 0.65} arrival = ? f related work : proposed approach : data : experiments and results : global performance : conclusion
17
16/36 most accurate representation probability distribution over the set of possible values belief representation B updated (C) ← f(B initial (C), SA(C), R) however system “hears” only a small number of conflicting values for a concept throughout a session max = 3 conflicting values heard only in 7% of cases, more than 1 value heard ABERDEEN, TX ABILENE, TX ALBANY, NY ALBUQUERQUE, NM ALLENTOWN, PAALEXANDRIA, LA ALLAKAKET, AK ALLIANCE, NE ALPENA, MI ALPINE, TX YUMA, AZ departure related work : proposed approach : data : experiments and results : global performance : conclusion
18
17/36 compressed belief representation k hypotheses + other dynamically add and drop hypotheses remember m hypotheses, add n new ones (m+n=k) belief representation departure_city [k=3, m=2, n=1] Austin Boston Houstonother S: Did you say you were flying from Austin? U: [NO ASPEN] Aspen S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] Ø BostonAspenother BostonAustinother B updated (C) ← f(B initial (C), SA(C), R) B … (C) is a multinomial variable of degree k+1 related work : proposed approach : data : experiments and results : global performance : conclusion
19
18/36 request S:When would you like to take this flight? U:Friday [FRIDAY] / 0.65 explicit confirmation S:Did you say you wanted to fly this Friday? U:Yes [GUEST] / 0.30 implicit confirmation S:A flight for Friday … at what time? U:At ten a.m. [AT TEN A_M] / 0.86 no action / unexpected update S:okay. I will complete the reservation. Please tell me your name or say ‘guest user’ if you are not a registered user. U:guest user [THIS TUESDAY] / 0.55 system action B updated (C) ← f(B initial (C), SA(C), R) related work : proposed approach : data : experiments and results : global performance : conclusion
20
19/36 acoustic / prosodic acoustic and language scores, duration, pitch information, voiced-to-unvoiced ratio, speech rate, initial pause lexical number of words, presence of words highly correlated with corrections or acknowledgements grammatical number of slots (new and repeated), goodness-of- parse scores dialog dialog state, turn number, expectation match, timeout, barge-in, concept identity priors priors for concept values confusability how confusable concept values are user response B updated (C) ← f(B initial (C), SA(C), R) related work : proposed approach : data : experiments and results : global performance : conclusion
21
20/36 approach multinomial regression problem multinomial generalized linear model sample efficient stepwise approach feature selection BIC to control over-fitting one separate model for each system action B updated (C) ← f SA(C) (B initial (C), R) B updated (C) ← f(B initial (C), SA(C), R) related work : proposed approach : data : experiments and results : global performance : conclusion
22
21/36 outline related work proposed approach data experiments and results effects on global performance conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion
23
22/36 data collected with RoomLine a phone-based mixed-initiative spoken dialog system conference room reservation explicit and implicit confirmations simple heuristic rules for belief updating explicit confirm: yes / no implicit confirm: new values overwrite old ones related work : proposed approach : data : experiments and results : global performance : conclusion
24
23/36 corpus user study 46 participants (first-time users) 10 scenario-based interactions each corpus 449 sessions, 8848 user turns orthographically transcribed manually annotated misunderstandings corrections correct concept values related work : proposed approach : data : experiments and results : global performance : conclusion
25
24/36 outline related work proposed approach data experiments and results effects on global performance conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion
26
25/36 models k=2 + other(m=1, n=1) k=3 + other(m=2, n=1) k=4 + other(m=3, n=1) full model all features basic model all features except priors and confusability runtime model all features available at runtime related work : proposed approach : data : experiments and results : global performance : conclusion
27
26/36 baselines initial baseline accuracy of system beliefs before the update heuristic baseline accuracy of heuristic update rule used by the system correction baseline accuracy if we knew exactly when the user corrects the system related work : proposed approach : data : experiments and results : global performance : conclusion
28
27/36 results for k=2 hyps + other 30.8 16.1 6.1 5.05.2 6.2 30% 20% 10% 0% ihBMFMRMc initial baseline (i) heuristic baseline (h) basic model (BM) full model (FM) runtime model (RM) correction baseline (c) explicit confirm 30.3 26.0 18.3 15.0 15.8 21.5 30% 20% 10% 0% ihBMFMRMc implicit confirm 98.2 9.5 8.6 5.7 5.6 12% 8% 4% 0% ihBMFMRM request 79.7 44.8 19.3 14.8 45% 30% 15% 0% ihBMFMRM other related work : proposed approach : data : experiments and results : global performance : conclusion
29
28/36 a question remains … … does this really matter? related work : proposed approach : data : experiments and results : global performance : conclusion
30
29/36 outline related work proposed approach data experiments and results effects on global performance conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion
31
30/36 a new user study … implemented models in RavenClaw 40 participants, first-time, non-native users improvements more likely at high word-error-rates 10 scenario-driven interactions each between-subjects; 2 gender-balanced groups control: RoomLine using heuristic update rules treatment: RoomLine using runtime models related work : proposed approach : data : experiments and results : global performance : conclusion
32
31/36 effect on task success logit(TaskSuccess) ← 2.09 - 0.05∙WER + 0.69∙Condition probability of task success 16% word error rate p=0.009 20%40%60%80%100%0% word error rate 0% 20% 40% 60% 80% 100% 78% 30% word error rate 78% 64% treatment control logistic ANOVA on task success related work : proposed approach : data : experiments and results : global performance : conclusion
33
32/36 how about efficiency? ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition significant improvement equivalent to 7.9% absolute reduction in word-error p=0.0003 related work : proposed approach : data : experiments and results : global performance : conclusion
34
33/36 outline related work proposed approach data experiments and results effects on global performance conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion
35
34/36 U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago summary arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.65 arrival = ? f arrival = { … }departure = { … } approach for constructing accurate beliefs integrate information across multiple turns large gains in task success and efficiency related work : proposed approach : data : experiments and results : global performance : conclusion
36
35/36 other advantages learns from data tuned to the domain in which it operates sample efficient / scalable performs a local one-turn optimization works independently on concepts portable decoupled from dialog task specification no strong assumptions about dialog management related work : proposed approach : data : experiments and results : global performance : conclusion
37
36/36 future work integrate information from n-best list integrate other high-level knowledge domain-specific constraints inter-concept dependencies unsupervised / implicit learning domain-specificity related work : proposed approach : data : experiments and results : global performance : conclusion
38
37/36 thank you! questions …
39
38/36 improvements at different WER word-error-rate absolute improvement in task success
40
39/36 user study 10 scenarios, fixed order presented graphically (explained during briefing) participants compensated per task success
41
40/36 informative features priors and confusability initial confidence scores concept identity barge-in expectation match repeated grammar slots
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.