a “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Tim Paek Eric Horvitz Microsoft Research
2/25 motivation spoken language interfaces are still very brittle [Parade, Sunday, March 26]
3/25 S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 confidence score misunderstandings Chicago Huntsville no no I’m traveling to Birmingham the tenth of August my destination is Birmingham arrival = {Seoul / 0.65}
4/25 / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 confidence score S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… misunderstandings arrival = {Seoul / 0.65} f arrival = ? arrival = { … } departure = { … }
5/25 belief updating: problem statement S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] arrival = {Seoul / 0.65} f arrival = ? given an initial belief B initial (C) over concept C a system action SA(C) a user response R construct an updated belief B updated (C) ← f(B initial (C), SA(C), R)
6/25 outline introduction current solutions approach experimental results effects on global performance conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
7/25 current solutions S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… what day did you need to travel? U: [THE TRAVELING to berlin P_M] / 0.65 / 0.35 confidence scores / detecting misunderstandings [Cox, Chase, Bansal, Hazen, Ravishankar, Walker, San-Segundo, Bohus] / 0.72 detecting corrections [Litman, Swerts, Hirschberg, Krahmer, Levow] arrival = {Seoul / 0.65} f arrival = ? track single values use simple heuristic belief updating rules explicit confirmations yes / no implicit confirmations new values overwrite old values intro : current solutions : approach : experimental results : global performance : conclusion
8/25 outline introduction current solutions approach experimental results effects on global performance conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
9/25 given an initial belief B initial (C) over concept C a system action SA(C) a user response R construct an updated belief B updated (C) ← f(B initial (C), SA(C), R) S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] belief updating: problem statement / 0.35 arrival = {Seoul / 0.65} arrival = ? f intro : current solutions : approach : experimental results : global performance : conclusion
10/25 probability distribution over the set of possible values belief representation B updated (C) ← f(B initial (C), SA(C), R) however system “hears” only a small number of conflicting values for a concept throughout a session max = 3 conflicting values heard ABERDEEN, TX ABILENE, TX ALBANY, NY ALBUQUERQUE, NM ALLENTOWN, PAALEXANDRIA, LA ALLAKAKET, AK ALLIANCE, NE ALPENA, MI ALPINE, TX YUMA, AZ departure intro : current solutions : approach : experimental results : global performance : conclusion
11/25 compressed belief representation k hypotheses + other dynamically add and drop hypotheses remember m hypotheses, add n new ones (m+n=k) belief representation departure_city [k=3, m=2, n=1] Austin Boston Houstonother S: Did you say you were flying from Austin? U: [NO ASPEN] Aspen S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] Ø BostonAspenother BostonAustinother B updated (C) ← f(B initial (C), SA(C), R) B … (C) is a multinomial variable of degree k+1 intro : current solutions : approach : experimental results : global performance : conclusion
12/25 request S:When would you like to take this flight? U:Friday [FRIDAY] / 0.65 explicit confirmation S:Did you say you wanted to fly this Friday? U:Yes [GUEST] / 0.30 implicit confirmation S:A flight for Friday … at what time? U:At ten a.m. [AT TEN A_M] / 0.86 no action / unexpected update S:okay. I will complete the reservation. Please tell me your name or say ‘guest user’ if you are not a registered user. U:guest user [THIS TUESDAY] / 0.55 system action B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion
13/25 acoustic / prosodic acoustic and language scores, duration, pitch information, voiced-to-unvoiced ratio, speech rate, initial pause lexical number of words, presence of words highly correlated with corrections or acknowledgements grammatical number of slots (new and repeated), goodness-of- parse scores dialog dialog state, turn number, expectation match, timeout, barge-in, concept identity priors priors for concept values confusability how confusable concept values are user response B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion
14/25 approach multinomial regression problem multinomial generalized linear model sample efficient stepwise approach feature selection one separate model for each system action B updated (C) ← f SA(C) (B initial (C), R) B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion
15/25 outline introduction current solutions approach experimental results effects on global performance conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
16/25 data RoomLine conference room reservations explicit and implicit confirmations user study 46 participants 10 scenario-based interactions each corpus 449 sessions, 8848 user turns transcribed & annotated misunderstandings, corrections, correct concept values intro : current solutions : approach : experimental results : global performance : conclusion
17/25 model performance Model (M) [k=2, all features] initial baseline (i) [error before update] heuristic baseline (h) [error after heuristic update] correction baseline (c) [error if we had perfect correction detection] % 20% 10% 0% ihMc explicit confirm c % 20% 10% 0% ihM implicit confirm % 8% 4% 0% ihM request % 30% 15% 0% ihM no action intro : current solutions : approach : experimental results : global performance : conclusion
18/25 outline introduction current solutions approach experimental results effects on global performance conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
19/25 a new user study … implemented models in the system 2 nd, between-subjects experiment control: using heuristic update rules treatment: using belief updating models 40 participants, non-native users improvements more likely at high word-error-rates intro : current solutions : approach : experimental results : global performance : conclusion
20/25 effect on task success logit(TaskSuccess) ← ∙WER ∙Condition probability of task success 16% word error rate p= %40%60%80%100%0% word error rate 0% 20% 40% 60% 80% 100% 78% 30% word error rate 78% 64% treatment control logistic ANOVA on task success intro : current solutions : approach : experimental results : global performance : conclusion
21/25 how about efficiency? ANOVA on task duration for successful tasks Duration ← ∙WER ∙Condition significant improvement equivalent to 7.9% absolute reduction in word-error p= intro : current solutions : approach : experimental results : global performance : conclusion
22/25 outline introduction current solutions approach experimental results effects on global performance conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion
23/25 U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago summary arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.65 arrival = ? f arrival = { … }departure = { … } approach for constructing accurate beliefs integrate information across multiple turns significant gains in task success and efficiency intro : current solutions : approach : experimental results : global performance : conclusion
24/25 other advantages learns from data tuned to the domain in which it operates sample efficient / scalable local one-turn optimization, concepts are independent RoomLine operates with 29 concepts cardinality: 2 several hundreds portable decoupled from dialog task specification no assumptions about dialog management intro : current solutions : approach : experimental results : global performance : conclusion
25/25 future work integrate information from n-best list integrate other high-level knowledge domain-specific constraints inter-concept dependencies investigate technique in other domains intro : current solutions : approach : experimental results : global performance : conclusion
26/25 thank you! questions …
27/25 improvements at different WER word-error-rate absolute improvement in task success
28/25 user study 10 scenarios, fixed order presented graphically (explained during briefing) participants compensated per task success
29/25 informative features priors and confusability initial confidence scores concept identity barge-in expectation match repeated grammar slots
30/25 Models (k=2, runtime features) # The model for the explicit confirm action new_1 other LR_MODEL(EC) k = answer_type[YES] = answer_type[NO] = answer_type[OTHER] = concept_id(equip) = i_th_confusability = ih_diff_lexical_one_word = lexw1[SMALL] = response_new_hyps_in_selh = END
31/25 Models (k=2, runtime features) # The model for the implicit confirm action new_1 other LR_MODEL(IC) mark_confirm = mark_disconfirm = i_th_conf = i_th_confusability = k = lex[THREE] = response_new_hyps_in_selh = turn_number = END
32/25 Models (k=2, runtime features) # The model for the request action new_1 other LR_MODEL(REQ) k = barge_in = concept_id(date)= concept_id(user_name) = dialog_state[RequestSpecificTimes] = ih_diff_lexical = initial_num_hyps_>_0 = total_num_parses = ur_selh_new_1_conf = ur_selh_new_1_confusability = ur_selh_new_1_prior = ur_selh_new_1_prior_>_1 = END