sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Alexander I. Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15213
2 systems often do not understand correctly S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] NON- understanding System cannot extract any meaningful information from the user’s turn S: What city are you leaving from? U: Birmingham [BERLIN PM] System extracts incorrect information from the user’s turn MIS- understanding non-understandings and misunderstandings
3 systems often do not understand correctly S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] NON- understanding System cannot extract any meaningful information from the user’s turn detection strategies policy (knowing how to engage the strategies) large space of strategies tradeoffs between them not well understood typically trivial; although diagnosis is not simple heuristics: “incremental prompting”
4 questions under investigation what are the main causes of non-understandings? how large is their impact on performance? how do various recovery strategies compare to each other? what are the relationships between strategies and user behaviors? can we improve global dialog performance by using a smarter policy? if yes, can we learn a better policy from data? data
5 data collection Roomline phone-based, mixed-initiative system conference room reservations experimental design control group: uninformed recovery policy wizard group: recovery policy implemented by wizard 46 participants, first-time users tasks & experimental procedure up to 10 scenario-driven interactions
6 non-understanding recovery strategies S: For when do you need the conference room? 1. ASK REPEAT Could you please repeat that? 2. ASK REPHRASE Could you please try to rephrase that? 3. NOTIFY (NTFY) Sorry, I didn’t catch that YIELD TURN (YLD) … 5. REPROMPT (RP) For when do you need the conference room? 6. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … 7. MOVE-ON Sorry, I didn’t catch that. For which day you need the room? 8. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … 9. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … 10. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am …
7 corpus statistics 449 sessions 8278 user turns utterances transcribed and checked manual annotations misunderstandings correct concept values at each turn sources of understanding errors user response-types to recovery strategies
8 questions under investigation data what are the main causes of non-understandings? how large is their impact on performance? how do various recovery strategies compare to each other? what are the relationships between strategies and user behaviors?
9 causes of non-understandings conversation level intention level signal level channel level channel Recognition ParsingInterpretation End-pointing Goal Semantics TextAudio user system
10 causes of non-understandings conversation level intention level signal level channel level out-of-application 16% out-of-grammar 16% ASR error 62% endpointer error
11 questions under investigation data what are the main causes of non-understandings? how large is their impact on performance? how do various recovery strategies compare to each other? what are the relationships between strategies and user behaviors? data : causes of non-understandings : impact on performance : strategy comparison : user behaviors
e -( α + β ·FNON) logistic regression P(Task Success) = modeling impact on performance 1
13 questions under investigation data what are the main causes of non-understandings? how large is their impact on performance? how do various recovery strategies compare to each other? what are the relationships between strategies and user behaviors? data : causes of non-understandings : impact on performance : strategy comparison : user behaviors
14 strategy performance – recovery rate overall logistic ANOVA significant differences in mean recovery rates all pairs comparison (corrected using FDR) MoveOn Help TerseYouCanSay RePrompt YouCanSay AskRephrase DetailedReprompt Notify AskRepeat Yield recovery rate
15 questions under investigation data what are the main causes of non-understandings? how large is their impact on performance? how do various recovery strategies compare to each other? what are the relationships between strategies and user behaviors? data : causes of non-understandings : impact on performance : strategy comparison : user behaviors
16 user response types tagging scheme by Shin also used by Choularton, Raux 5 categories repeat rephrase contradict change other
17 50% 40% 30% 20% 10% response types after non-understaning 0% rephraserepeat contradictchangeother Pizza (choularton & dale) Communicator (Shin et al.) Roomline (this study)
18 user response types by strategy MoveOn Help TerseYouCanSay RePrompt YouCanSay AskRephrase DetailedReprompt Notify AskRepeat Yield Rephrase Change Repeat Other 100% 80% 60% 40% 20% 0%
19 sources of non-understandings impact on performance strategy comparison user responses summary can we improve global dialog performance by using a smarter policy? can we learn a better policy from data? asr, but also “language” errors → more shaping strategies … regression model allows better quantitative assessment help, “move-on” → further investigate “move-on” margin for improving control over user responses yes preliminary results promising …
20 thank you! questions …
21 rejections Figure 3. Misunderstandings and non-understandings before and after rejections Before rejection mechanism After rejection mechanism False rejections Correct rejections
22 strategy performance assessment recovery rate recovery utility weighted sum of correctly and incorrectly acquired concepts weights are determined in a data-driven fashion recovery efficiency also takes time to recovery into account
23 experimental design: scenarios 10 scenarios, fixed order presented graphically (explained during briefing)
24 strategy pair-wise comparison recovery performance ranked list, based on pair-wise t-tests: RNKMOVEHELPTYCSRPYCSARPHDRPNTFYAREPYLD MOVE1MOVE: HELP2HELP: HELP3TYCS: SIG4RP: HELP5YCS: SIG6ARPH: SIG?DRP: SIG?NTFY: SIG?AREP: SIG?YLD: CER evaluation shows similar results
25 recovery for various response-types
27 impact of recovery rate on performance 1 + e -( α + β ·RecoveryRate) recovery = next turn is correctly understood P(Task Success) = 1