Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus

Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu) Work by: Dan Bohus, Alex Rudnicky Carnegie Mellon University, 2001

11-04-01Modeling the cost of misunderstanding …2 Outline  Quick overview of previous utterance-level confidence annotation work.  Modeling the cost of misunderstandings in spoken dialog systems.  Experiments & results.  Further analysis.  Summary, further work, conclusion

11-04-01Modeling the cost of misunderstanding …3 Utterance-Level Confidence Annotation Overview  Confidence annotation = data-driven classification  Corpus: 2 months, 131 dialogs, 4550 utterances.  Features: 12 features from decoder, parsing, dialog management levels.  Classifiers: Decision Tree, ANN, BayesNet, AdaBoost, NaiveBayes, SVM + Logistic Regression model (later on).

11-04-01Modeling the cost of misunderstanding …4 Confidence annotator performance  Baseline error rate:32 %  Garble baseline:25 %  Classifiers performance:16 %  Differences between classifiers are statistically insignificant except for Naïve Bayes  On a soft-metric, logistic regression model clearly outperformed the others  But is this the right way to evaluate performance?

11-04-01Modeling the cost of misunderstanding …5 Judging Performance  Classification Error Rate (FP+FN).  Assumes implicitly that FP and FN errors have same cost  But cost of misunderstanding in dialog systems is presumably different for FPs and FNs.  Build an error function which take into account these costs, and optimize for that.  Cost also depends on  domain/system ~ not a problem  dialog state

11-04-01Modeling the cost of misunderstanding …6 Problem Formulation  (1) Develop a cost model which allows us to quantitatively assess the costs of FP and FN errors.  (2) Use the costs to pick the optimal tradeoff point on the classifier ROC.

11-04-01Modeling the cost of misunderstanding …7 The Cost Model  Model the impact of the FPs and FNs on the system performance  Identify a suitable performance metric P  Build a statistical regression model at the dialog session level:  P = f(FPs, FNs)  P = k + Cost FP *FP + Cost FN *FN (Linear Regr)  Then we can plot f, and implicitly optimize for P

11-04-01Modeling the cost of misunderstanding …8 Measuring Performance  User Satisfaction (i.e. 5-point scale)  Hard to get  Very subjective ~ hard to make it consistent across users  Concept transfer efficiency:  CTC: correctly transferred concepts per turn  ITC: incorrectly transferred concepts per turn  Completion

11-04-01Modeling the cost of misunderstanding …9 Detour : The Dataset  134 dialogs (2561 utterances), collected using 4 scenarios  Satisfaction scores only for 35 dialogs  Corpus manually labeled at the concept and level  4 labels: OK / RBAD / PBAD / OOD  Aggregate utterance labels generated  Confidence annotator decisions logged  Computed counts of FPs, FNs, CTCs, ITCs for each session

11-04-01Modeling the cost of misunderstanding …10 Example  U: I want to fly from Pittsburgh to Boston  S: I want to fly from Pittsburgh to Austin  C: [I_want/OK] [Depart_Loc/OK] [Arrive_Loc/RBAD]  Only 2 relevantly expressed concepts  If Accept: CTC = 1, ITC = 1  If Reject: CTC = 0, ITC = 0

11-04-01Modeling the cost of misunderstanding …11 Targeting Efficiency: Model 1  3 Successively refined models  CTC = FP + FN + TN + k  CTC - correctly transferred concepts / turn  TN – true negatives ModelR 2 allR 2 trainR 2 test CTC=FP+FN+TN0.81 0.73

11-04-01Modeling the cost of misunderstanding …12 Targeting Efficiency: Model 2  CTC - ITC = (REC +) FP + FN + TN + k  ITC - incorrectly transferred concepts / turn  REC – relevantly expressed concepts ModelR 2 allR 2 trainR 2 test CTC=FP+FN+TN0.81 0.73 CTC-ITC=FP+FN+TN0.86 0.78 CTC-ITC=REC+FP+FN+TN0.89 0.83

11-04-01Modeling the cost of misunderstanding …13 Targeting Efficiency: Model 3  CTC-ITC = REC+FPC+FPNC+FN+TN+k  2 types of FPs:  With concepts - FPC  Without concepts - FPNC ModelR 2 allR 2 trainR 2 test CTC=FP+FN+TN0.81 0.73 CTC-ITC=FP+FN+TN0.86 0.78 CTC-ITC=REC+FP+FN+TN0.89 0.83 CTC-ITC =REC+FPC+FPNC+FN+TN 0.94 0.90

11-04-01Modeling the cost of misunderstanding …14 Model 3 - Results k0.41 C REC 0.62 C FPNC -0.48 C FPC -2.12 C FN -1.33 C TN -0.55  CTC-ITC = REC+FPC+FPNC+FN+TN+k

11-04-01Modeling the cost of misunderstanding …15 Other models  Completion (binary)  Logistic regression model  Estimated model does not indicate a good fit  User satisfaction (5-point scale)  Based on only 35 dialogs  R 2 = 0.61 (similar to literature – Walker et al)  Explanation: subjectivity of metric + limited dataset

11-04-01Modeling the cost of misunderstanding …16 Problem Formulation  (1) Develop a cost model which allows us to quantitatively assess the costs of FP and FN errors.  (2) Use the costs to pick the optimal tradeoff point on the classifier ROC.

11-04-01Modeling the cost of misunderstanding …17 Tuning the Confidence Annotator  Using Model 3  CTC-ITC = REC+FPNC+FPC+FN+TN+k  Drop k & REC, plug in the values  Cost = 0.48FPNC+2.12FPC+1.33FN+0.56TN  Minimize Cost instead of Classification Error Rate (FP+FN), and we’ll implicitly maximize concept transfer efficiency.

11-04-01Modeling the cost of misunderstanding …18 Operating Characteristic

11-04-01Modeling the cost of misunderstanding …19 Further Analysis  Is CTC-ITC really modeling dialog performance ?  Mean = 0.71, Std.Dev = 0.28  Mean for completed dialogs = 0.82  Mean for uncompleted dialogs = 0.57  Difference between means significant at very high level of confidence  P-value = 7.23*10 -9 (in t-test)  So, looks like CTC-ITC is okay, right ?

11-04-01Modeling the cost of misunderstanding …20 Further Analysis (cont’d)  Can we reliably extrapolate to other areas of the operating characteristic ?

11-04-01Modeling the cost of misunderstanding …21 Further Analysis (cont’d)  Can we reliably extrapolate to other areas of the operating characteristic ?  Yes, look at the distribution of the FP and FN ratios across dialogs.

11-04-01Modeling the cost of misunderstanding …22 Further Analysis (cont’d)  Impact of baseline error rate ?  Compared models constructed based on high and low error rates:  For low error rate curve becomes monotonically increasing  This clearly indicates that “trust everything / have no confidence ” is the way to go in this setting

11-04-01Modeling the cost of misunderstanding …23 Our explanation so far…  Ability to easily overwrite incorrectly captured information in the CMU Communicator  Relatively low error rates  Likelihood of repeated misrecognition is low

11-04-01Modeling the cost of misunderstanding …24 Conclusion  Data-driven approach to quantitatively assess the costs of various types of misunderstandings.  Models based on efficiency fit data well; obtained costs confirm intuition.  For CMU Communicator, model predicts that total cost stays the same across a large range of the operating characteristic of the classifier.

11-04-01Modeling the cost of misunderstanding …25 Further Experiments  But, of course, we can verify predictions experimentally  Collect new data with the system running with a very low threshold.  55 dialogs collected so far.  Thanks to those who have participated in these experiments.  “Help if you have the time” to the others … www.cs.cmu.edu/~dbohus/scenarios.htm  Re-estimate models, verify predictions

11-04-01Modeling the cost of misunderstanding …26 Confusion Matrix OKBAD System says OKTPFP System says BADFNTN FP = False acceptance FN = False detection/rejection Fallout = FP/(FP+TN) = FP/NBAD CDR = 1-Fallout = 1-(FP/NBAD)

Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus

Similar presentations

Presentation on theme: "Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus

Similar presentations

Presentation on theme: "Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus"— Presentation transcript:

Similar presentations

About project

Feedback