belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Alex Rudnicky,

belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Alex Rudnicky, CMU Eric Horvitz & Tim Paek, MSR Antoine Raux

2/36 spoken dialog systems  use natural language processing technology  engage in a goal-oriented conversation  research community  information access, command-and-control  personal assistants, taskable agents, tutoring systems  industry  simpler automated phone systems

3/36 the problem … misunderstandings lead to interaction breakdowns

4/36 more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT same PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

5/36 understanding errors  stem from the speech recognition process  difficult operating conditions  typical word-error-rates  20-30%  up to 50% for non-natives  strong negative impact on interactions

6/36 pathways to increased robustness  gracefully handle errors through conversation  improve recognition  detect the problems  develop a set of recovery strategies  know how to choose between them (policy)

7/36 S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] guarding against misunderstandings / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 / 0.07 confidence scores

8/36 S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] guarding against misunderstandings arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 / 0.07 confirmation actions  reject  explicit confirmation Did you say Seoul?  implicit confirmation traveling to Seoul … What day did you need to travel?  accept confidence scores

9/36 S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] belief updating arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 / 0.07 arrival = ? f arrival = { … } departure = { … } confidence scores

10/36 S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] belief updating: problem statement / 0.35 arrival = {Seoul / 0.65} arrival = ? f  given  an initial belief B initial (C) over concept C  a system action SA(C)  a user response R  construct an updated belief  B updated (C) ← f(B initial (C), SA(C), R)

11/36 outline  related work  proposed approach  data  experiments and results  effects on global performance  conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion

12/36 S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] detecting misunderstandings and corrections  confidence annotation  word-level [Cox, Chase, Bansal, Ravinshankar, etc]  semantic confidence annotation [Walker, San-Segundo, Bohus, etc]  correction detection [Litman, Swerts, Hirschberg, Krahmer, Levow]  detect when the user corrects the system related work : proposed approach : data : experiments and results : global performance : conclusion Conf=0.35 arrival = {Seoul / 0.65} arrival = ? Corr=0.47 ?

13/36 current solutions for tracking beliefs  most systems only track single values  new values overwrite old values  use simple heuristic rules  explicit confirmation S: did you say you wanted to fly to Seoul? yes → trust hypothesis no → delete hypothesis “other” → non-understanding  implicit confirmation S: traveling to Seoul … what day did you need to travel? rely on new values overwriting old values related work : proposed approach : data : experiments and results : global performance : conclusion

15/36  given  an initial belief B initial (C) over concept C  a system action SA(C)  a user response R  construct an updated belief  B updated (C) ← f(B initial (C), SA(C), R) S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] belief updating: problem statement / 0.35 arrival = {Seoul / 0.65} arrival = ? f related work : proposed approach : data : experiments and results : global performance : conclusion

16/36  most accurate representation  probability distribution over the set of possible values belief representation B updated (C) ← f(B initial (C), SA(C), R)  however  system “hears” only a small number of conflicting values for a concept throughout a session max = 3 conflicting values heard only in 7% of cases, more than 1 value heard ABERDEEN, TX ABILENE, TX ALBANY, NY ALBUQUERQUE, NM ALLENTOWN, PAALEXANDRIA, LA ALLAKAKET, AK ALLIANCE, NE ALPENA, MI ALPINE, TX YUMA, AZ departure related work : proposed approach : data : experiments and results : global performance : conclusion

17/36  compressed belief representation  k hypotheses + other  dynamically add and drop hypotheses  remember m hypotheses, add n new ones (m+n=k) belief representation departure_city [k=3, m=2, n=1] Austin Boston Houstonother S: Did you say you were flying from Austin? U: [NO ASPEN] Aspen S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] Ø BostonAspenother BostonAustinother B updated (C) ← f(B initial (C), SA(C), R)  B … (C) is a multinomial variable of degree k+1 related work : proposed approach : data : experiments and results : global performance : conclusion

18/36 request S:When would you like to take this flight? U:Friday [FRIDAY] / 0.65 explicit confirmation S:Did you say you wanted to fly this Friday? U:Yes [GUEST] / 0.30 implicit confirmation S:A flight for Friday … at what time? U:At ten a.m. [AT TEN A_M] / 0.86 no action / unexpected update S:okay. I will complete the reservation. Please tell me your name or say ‘guest user’ if you are not a registered user. U:guest user [THIS TUESDAY] / 0.55 system action B updated (C) ← f(B initial (C), SA(C), R) related work : proposed approach : data : experiments and results : global performance : conclusion

19/36 acoustic / prosodic acoustic and language scores, duration, pitch information, voiced-to-unvoiced ratio, speech rate, initial pause lexical number of words, presence of words highly correlated with corrections or acknowledgements grammatical number of slots (new and repeated), goodness-of- parse scores dialog dialog state, turn number, expectation match, timeout, barge-in, concept identity priors priors for concept values confusability how confusable concept values are user response B updated (C) ← f(B initial (C), SA(C), R) related work : proposed approach : data : experiments and results : global performance : conclusion

20/36 approach  multinomial regression problem  multinomial generalized linear model  sample efficient  stepwise approach feature selection BIC to control over-fitting  one separate model for each system action B updated (C) ← f SA(C) (B initial (C), R) B updated (C) ← f(B initial (C), SA(C), R) related work : proposed approach : data : experiments and results : global performance : conclusion

22/36 data  collected with RoomLine  a phone-based mixed-initiative spoken dialog system  conference room reservation  explicit and implicit confirmations  simple heuristic rules for belief updating  explicit confirm: yes / no  implicit confirm: new values overwrite old ones related work : proposed approach : data : experiments and results : global performance : conclusion

23/36 corpus  user study  46 participants (first-time users)  10 scenario-based interactions each  corpus  449 sessions, 8848 user turns  orthographically transcribed  manually annotated misunderstandings corrections correct concept values related work : proposed approach : data : experiments and results : global performance : conclusion

25/36 models  k=2 + other(m=1, n=1)  k=3 + other(m=2, n=1)  k=4 + other(m=3, n=1)  full model  all features  basic model  all features except priors and confusability  runtime model  all features available at runtime related work : proposed approach : data : experiments and results : global performance : conclusion

26/36 baselines  initial baseline  accuracy of system beliefs before the update  heuristic baseline  accuracy of heuristic update rule used by the system  correction baseline  accuracy if we knew exactly when the user corrects the system related work : proposed approach : data : experiments and results : global performance : conclusion

27/36 results for k=2 hyps + other 30.8 16.1 6.1 5.05.2 6.2 30% 20% 10% 0% ihBMFMRMc initial baseline (i) heuristic baseline (h) basic model (BM) full model (FM) runtime model (RM) correction baseline (c) explicit confirm 30.3 26.0 18.3 15.0 15.8 21.5 30% 20% 10% 0% ihBMFMRMc implicit confirm 98.2 9.5 8.6 5.7 5.6 12% 8% 4% 0% ihBMFMRM request 79.7 44.8 19.3 14.8 45% 30% 15% 0% ihBMFMRM other related work : proposed approach : data : experiments and results : global performance : conclusion

28/36 a question remains … … does this really matter? related work : proposed approach : data : experiments and results : global performance : conclusion

30/36 a new user study …  implemented models in RavenClaw  40 participants, first-time, non-native users improvements more likely at high word-error-rates  10 scenario-driven interactions each  between-subjects; 2 gender-balanced groups  control: RoomLine using heuristic update rules  treatment: RoomLine using runtime models related work : proposed approach : data : experiments and results : global performance : conclusion

31/36 effect on task success logit(TaskSuccess) ← 2.09 - 0.05∙WER + 0.69∙Condition probability of task success 16% word error rate p=0.009 20%40%60%80%100%0% word error rate 0% 20% 40% 60% 80% 100% 78% 30% word error rate 78% 64% treatment control  logistic ANOVA on task success related work : proposed approach : data : experiments and results : global performance : conclusion

32/36 how about efficiency?  ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition  significant improvement  equivalent to 7.9% absolute reduction in word-error p=0.0003 related work : proposed approach : data : experiments and results : global performance : conclusion

34/36 U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago summary arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.65 arrival = ? f arrival = { … }departure = { … }  approach for constructing accurate beliefs  integrate information across multiple turns  large gains in task success and efficiency related work : proposed approach : data : experiments and results : global performance : conclusion

35/36 other advantages  learns from data  tuned to the domain in which it operates  sample efficient / scalable  performs a local one-turn optimization  works independently on concepts  portable  decoupled from dialog task specification  no strong assumptions about dialog management related work : proposed approach : data : experiments and results : global performance : conclusion

36/36 future work  integrate information from n-best list  integrate other high-level knowledge  domain-specific constraints  inter-concept dependencies  unsupervised / implicit learning  domain-specificity related work : proposed approach : data : experiments and results : global performance : conclusion

37/36 thank you! questions …

38/36 improvements at different WER word-error-rate absolute improvement in task success

39/36 user study  10 scenarios, fixed order  presented graphically (explained during briefing)  participants compensated per task success

40/36 informative features  priors and confusability  initial confidence scores  concept identity  barge-in  expectation match  repeated grammar slots

belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Alex Rudnicky,

Similar presentations

Presentation on theme: "belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Alex Rudnicky,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Alex Rudnicky,

Similar presentations

Presentation on theme: "belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Alex Rudnicky,"— Presentation transcript:

Similar presentations

About project

Feedback