Download presentation
Presentation is loading. Please wait.
Published bySara Nicholson Modified over 9 years ago
1
An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004
2
2 Non-understandings S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] System knows there was a user turn, but There is no relevant semantic information in the input Confidence is too low to trust any semantic information in the input 10 – 30% of turns in a mixed initiative system GOAL: Do a better job at recovering from non-understandings
3
3 Recovery Ingredients Detection Set of strategies (actions) Policy (method for choosing between actions)
4
4 Recovery Ingredients – Non-understandings Detection Generally, system knows when a non- understanding happened Set of strategies (actions) Notify non-understanding, repeat question, ask repeat/rephrase, provide help, etc. Policy (method for choosing between actions) Traditionally fixed heuristic
5
5 Issues under Investigation Detection Analysis of error types, blame assignment, impact on task performance Detection of error type Adaptation of rejection threshold Set of strategies Investigate individual strategy performance Identify potential new strategies Policy Impact of a “smarter” policy on performance Building a policy from data
6
6 Issues under Investigation Detection Analysis of error types, blame assignment, impact on task performance Detection of error type Adaptation of rejection threshold Set of strategies Investigate individual strategy performance Identify potential new strategies Policy Impact of a “smarter” policy on performance Building a policy from data
7
7 Experimental Design - Overview Subjects interact over the telephone with RoomLine Perform a number of scenario-based tasks Between-subjects experiment Control: system uses a random (uniform) policy for engaging the non-understanding recovery strategies Wizard: policy is determined at runtime by a human (wizard) 46 subjects, balanced Gender x Native
8
8 MOVE-ON HELP SIGNAL Non-understanding Strategies S: For when do you need the room? U: [non-understanding] FAIL Sorry, I didn’t catch that. Tell me for what day you need the room YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am … ASK REPEAT (AREP) Could you please repeat that? ASK REPHRASE (ARPH) Could you please try to rephrase that? NOTIFY (NTFY) Sorry, I don’t think I understood you correctly… YIELD TURN (YLD) … REPEAT SYSTEM PROMPT (REPP) For when do you need the conference room? EXPLAIN MORE (EXPL) Right now I need to know the date and time for when you need the reservation … Verb. V T A T T T T A T Prompt. Y N Y N N N N Y Y
9
9 Experimental Design: Scenarios Presented graphically (explained during briefing)
10
10 Corpus Statistics / Characteristics 46 users; 484 sessions; ~ 9000 turns Transcribed Annotated with: Misunderstandings & deletions Non-understandings Concept transfer accuracy Transcript grammaticality labels OK, OOR, OOG, OOS, OOD, VOID Correct concept values in each turn – [ongoing]
11
11 Back to the Issues Detection Analysis of error types, blame assignment, impact on task performance Detection of error type Adaptation of rejection threshold Set of strategies Investigate individual strategy performance Identify potential new strategies Policy Impact of a “smarter” policy on performance Building a policy from data
12
12 Impact of Policy on Performance General picture Significant improvements for non-natives, especially after non-understandings Global Task success Significant improvements (x1.77) for non-natives SASSI Scores: nothing detectable Local WER significant improvements across the board Understanding error metrics (CT, CER, NONU, MIS) significant improvement for non-natives Recovery Nothing detectable (?) Faster on the wizard side
13
13 Impact of Policy on Performance … Weird stuff Conclusion?
14
14 Detection Analysis of error types, blame assignment, impact on task performance Detection of error type Adaptation of rejection threshold Set of strategies Investigate individual strategy performance Identify potential new strategies Policy Impact of a “smarter” policy on performance Building a policy from data Back to the Issues
15
15 Impact on task performance Models for predicting task success from various types of errors [show in Matlab] Can shed more light on: Effect of the policy Native / non-native differences Costs of various types of errors Currently analyzing it. Issues: Build (state-)conditioned cost models Robustness
16
16 Back to the Issues Detection Analysis of error types, blame assignment, impact on task performance Detection of error type Adaptation of rejection threshold Set of strategies Investigate individual strategy performance Identify potential new strategies Policy Impact of a “smarter” policy on performance Building a policy from data
17
17 Individual strategy performance Under “random”/uniform conditions (control) All-way-comparison: Matlab, summary file (rank analysis ?) First conclusions: Moving-on helps Help helps Just signaling is not so good, YLD is pretty bad Compare with wizard: Ask Repeat boosted (significantly x1.58) Wizard reverse engineering (?) HELP / FAIL behavior in non-natives (?) Predicting success: when to help, when to ask repeat?
18
18 MOVE-ON HELP SIGNAL Non-understanding Strategies S: For when do you need the room? U: [non-understanding] FAIL Sorry, I didn’t catch that. Tell me for what day you need the room YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am … ASK REPEAT (AREP) Could you please repeat that? ASK REPHRASE (ARPH) Could you please try to rephrase that? NOTIFY (NTFY) Sorry, I don’t think I understood you correctly… YIELD TURN (YLD) … REPEAT SYSTEM PROMPT (REPP) For when do you need the conference room? EXPLAIN MORE (EXPL) Right now I need to know the date and time for when you need the reservation … Verb. V T A T T T T A T Prompt. Y N Y N N N N Y Y
19
19 Back to the Issues Detection Analysis of error types, blame assignment, impact on task performance Detection of error type Adaptation of rejection threshold Set of strategies Investigate individual strategy performance Identify potential new strategies Policy Impact of a “smarter” policy on performance Building a policy from data
20
20 Identify Potential New Strategies Better informed by the error-type / blame assignment analysis (top of my stack) So far Ask user to speak shorter Ask user to speak louder Speculative execution
21
21 Speculative execution A lot of small recognition errors appear repeatedly YES > THIS, NEXT GUEST > YES GUEST USER > TUESDAY Etc… Learn from experience how to avoid these errors Example: S: Did you say you wanted a room for Tuesday? U: YES [THIS] S: Sorry, I didn’t catch that. Did you say you wanted a room for Tuesday? U: YES [YES] Learn that “THIS” actually means “YES”
22
22 Speculative execution - components Learn mapping Learner with high precision (no false positives) Apply mapping Learner with high recall Precision / Recall tradeoff How much can this method really buy us?
23
23 Speculative Execution – 0st cut Conservative Learner Learns from non-understanding segments where Dialogue state is the same throughout (mapping is state- specific) Final response is in focus, contains only one concept and has high confidence Conservative Applier Apply only when dialogue state matches and non- understood input matches perfectly at the state level Going through the whole dataset, learning as you go results: 10% application at the end, does not asymptote yet Precision? (480 ruled learned) How does this look to you?
24
24 Speculative execution Of course much more to dig in here … Learners which generalize more Confidence score on the rules Active learning: appliers with confidence, and feedback into learning Potentially use it in other cases (not only non- understandings, but potential misunderstandings)
25
25 Back to the Issues Detection Analysis of error types, blame assignment, impact on task performance Detection of error type Adaptation of rejection threshold Set of strategies Investigate individual strategy performance Identify potential new strategies Policy Impact of a “smarter” policy on performance Building a policy from data
26
26 Building a Policy from Data Experiment shown that wizard boosted performance of Ask Repeat Can we predict likelihood of success for each strategy from features available online? Identify informative features Might be better informed by error-type/blame-assignment analysis Try simple classifiers MDP (?) Can also formulate problem as a decision boundary or classification problem… (?)
27
27 Thank you!
28
28 Experimental Design: Control vs Wizard Conditions Control: random (uniform) policy Wizard: human with access to audio & system state Performance Random (uniform) policy Manually designed policy Data-driven designed policy Human wizard with access to audio ? Human wizard with access to only system state ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.