Download presentation
Presentation is loading. Please wait.
1
a “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Tim Paek Eric Horvitz Microsoft Research
2
2/25 motivation spoken language interfaces are still very brittle [Parade, Sunday, March 26]
3
3/25 S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 confidence score misunderstandings Chicago Huntsville no no I’m traveling to Birmingham the tenth of August my destination is Birmingham arrival = {Seoul / 0.65}
4
4/25 / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 confidence score S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… misunderstandings arrival = {Seoul / 0.65} f arrival = ? arrival = { … } departure = { … }
5
5/25 belief updating: problem statement S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] arrival = {Seoul / 0.65} f arrival = ? given an initial belief B initial (C) over concept C a system action SA(C) a user response R construct an updated belief B updated (C) ← f(B initial (C), SA(C), R)
6
6/25 outline introduction current solutions approach experimental results effects on global performance conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
7
7/25 current solutions S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… what day did you need to travel? U: [THE TRAVELING to berlin P_M] / 0.65 / 0.35 confidence scores / detecting misunderstandings [Cox, Chase, Bansal, Hazen, Ravishankar, Walker, San-Segundo, Bohus] / 0.72 detecting corrections [Litman, Swerts, Hirschberg, Krahmer, Levow] arrival = {Seoul / 0.65} f arrival = ? track single values use simple heuristic belief updating rules explicit confirmations yes / no implicit confirmations new values overwrite old values intro : current solutions : approach : experimental results : global performance : conclusion
8
8/25 outline introduction current solutions approach experimental results effects on global performance conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
9
9/25 given an initial belief B initial (C) over concept C a system action SA(C) a user response R construct an updated belief B updated (C) ← f(B initial (C), SA(C), R) S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] belief updating: problem statement / 0.35 arrival = {Seoul / 0.65} arrival = ? f intro : current solutions : approach : experimental results : global performance : conclusion
10
10/25 probability distribution over the set of possible values belief representation B updated (C) ← f(B initial (C), SA(C), R) however system “hears” only a small number of conflicting values for a concept throughout a session max = 3 conflicting values heard ABERDEEN, TX ABILENE, TX ALBANY, NY ALBUQUERQUE, NM ALLENTOWN, PAALEXANDRIA, LA ALLAKAKET, AK ALLIANCE, NE ALPENA, MI ALPINE, TX YUMA, AZ departure intro : current solutions : approach : experimental results : global performance : conclusion
11
11/25 compressed belief representation k hypotheses + other dynamically add and drop hypotheses remember m hypotheses, add n new ones (m+n=k) belief representation departure_city [k=3, m=2, n=1] Austin Boston Houstonother S: Did you say you were flying from Austin? U: [NO ASPEN] Aspen S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] Ø BostonAspenother BostonAustinother B updated (C) ← f(B initial (C), SA(C), R) B … (C) is a multinomial variable of degree k+1 intro : current solutions : approach : experimental results : global performance : conclusion
12
12/25 request S:When would you like to take this flight? U:Friday [FRIDAY] / 0.65 explicit confirmation S:Did you say you wanted to fly this Friday? U:Yes [GUEST] / 0.30 implicit confirmation S:A flight for Friday … at what time? U:At ten a.m. [AT TEN A_M] / 0.86 no action / unexpected update S:okay. I will complete the reservation. Please tell me your name or say ‘guest user’ if you are not a registered user. U:guest user [THIS TUESDAY] / 0.55 system action B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion
13
13/25 acoustic / prosodic acoustic and language scores, duration, pitch information, voiced-to-unvoiced ratio, speech rate, initial pause lexical number of words, presence of words highly correlated with corrections or acknowledgements grammatical number of slots (new and repeated), goodness-of- parse scores dialog dialog state, turn number, expectation match, timeout, barge-in, concept identity priors priors for concept values confusability how confusable concept values are user response B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion
14
14/25 approach multinomial regression problem multinomial generalized linear model sample efficient stepwise approach feature selection one separate model for each system action B updated (C) ← f SA(C) (B initial (C), R) B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion
15
15/25 outline introduction current solutions approach experimental results effects on global performance conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
16
16/25 data RoomLine conference room reservations explicit and implicit confirmations user study 46 participants 10 scenario-based interactions each corpus 449 sessions, 8848 user turns transcribed & annotated misunderstandings, corrections, correct concept values intro : current solutions : approach : experimental results : global performance : conclusion
17
17/25 model performance Model (M) [k=2, all features] initial baseline (i) [error before update] heuristic baseline (h) [error after heuristic update] correction baseline (c) [error if we had perfect correction detection] 30.8 16.1 5.0 6.2 30% 20% 10% 0% ihMc explicit confirm c 30.3 26.0 15.0 21.5 30% 20% 10% 0% ihM implicit confirm 98.2 9.5 5.7 12% 8% 4% 0% ihM request 79.7 44.8 14.8 45% 30% 15% 0% ihM no action intro : current solutions : approach : experimental results : global performance : conclusion
18
18/25 outline introduction current solutions approach experimental results effects on global performance conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
19
19/25 a new user study … implemented models in the system 2 nd, between-subjects experiment control: using heuristic update rules treatment: using belief updating models 40 participants, non-native users improvements more likely at high word-error-rates intro : current solutions : approach : experimental results : global performance : conclusion
20
20/25 effect on task success logit(TaskSuccess) ← 2.09 - 0.05∙WER + 0.69∙Condition probability of task success 16% word error rate p=0.009 20%40%60%80%100%0% word error rate 0% 20% 40% 60% 80% 100% 78% 30% word error rate 78% 64% treatment control logistic ANOVA on task success intro : current solutions : approach : experimental results : global performance : conclusion
21
21/25 how about efficiency? ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition significant improvement equivalent to 7.9% absolute reduction in word-error p=0.0003 intro : current solutions : approach : experimental results : global performance : conclusion
22
22/25 outline introduction current solutions approach experimental results effects on global performance conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
23
23/25 U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago summary arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.65 arrival = ? f arrival = { … }departure = { … } approach for constructing accurate beliefs integrate information across multiple turns significant gains in task success and efficiency intro : current solutions : approach : experimental results : global performance : conclusion
24
24/25 other advantages learns from data tuned to the domain in which it operates sample efficient / scalable local one-turn optimization, concepts are independent RoomLine operates with 29 concepts cardinality: 2 several hundreds portable decoupled from dialog task specification no assumptions about dialog management intro : current solutions : approach : experimental results : global performance : conclusion
25
25/25 future work integrate information from n-best list integrate other high-level knowledge domain-specific constraints inter-concept dependencies investigate technique in other domains intro : current solutions : approach : experimental results : global performance : conclusion
26
26/25 thank you! questions …
27
27/25 improvements at different WER word-error-rate absolute improvement in task success
28
28/25 user study 10 scenarios, fixed order presented graphically (explained during briefing) participants compensated per task success
29
29/25 informative features priors and confusability initial confidence scores concept identity barge-in expectation match repeated grammar slots
30
30/25 Models (k=2, runtime features) # The model for the explicit confirm action new_1 other LR_MODEL(EC) k =-15.96 3.61 answer_type[YES] =-12.67 -5.90 answer_type[NO] = 4.55 3.15 answer_type[OTHER] = 1.20 -0.75 concept_id(equip) = 6.96 4.42 i_th_confusability = -3.67 -4.80 ih_diff_lexical_one_word =-15.99 -1.17 lexw1[SMALL] = 17.63 20.26 response_new_hyps_in_selh = 18.85 0.41 END
31
31/25 Models (k=2, runtime features) # The model for the implicit confirm action new_1 other LR_MODEL(IC) mark_confirm = 0.31 -1.74 mark_disconfirm = 3.39 1.57 i_th_conf = 0.39 -3.63 i_th_confusability = -4.17 -4.54 k = -16.83 3.75 lex[THREE] = -2.25 -2.68 response_new_hyps_in_selh = 20.88 1.70 turn_number = 0.01 0.03 END
32
32/25 Models (k=2, runtime features) # The model for the request action new_1 other LR_MODEL(REQ) k = -0.78 3.56 barge_in = -2.07 -1.40 concept_id(date)= 11.29 9.80 concept_id(user_name) = 1.93-13.91 dialog_state[RequestSpecificTimes] = 13.29 14.26 ih_diff_lexical = -1.54 0.17 initial_num_hyps_>_0 = -21.70 -2.71 total_num_parses = -1.06 -0.40 ur_selh_new_1_conf = 4.09 1.76 ur_selh_new_1_confusability = 5.81 1.70 ur_selh_new_1_prior = 0.67 0.98 ur_selh_new_1_prior_>_1 = -1.00 -6.38 END
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.