Presentation is loading. Please wait.

Presentation is loading. Please wait.

A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department.

Similar presentations


Presentation on theme: "A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department."— Presentation transcript:

1 a principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217

2 2 understanding errors and rejection  systems often misunderstand  use confidence scores  common design pattern  compare input confidence against a threshold  reject utterance if confidence is too low  may lead to false rejections

3 3 010.50.750.25 rejection threshold 0% 25% 50% 75%  misunderstandings vs. false rejections rejection tradeoff misunderstandings false rejections

4 4 010.50.750.25 rejection threshold  misunderstandings vs. false rejections  correctly vs. incorrectly transferred concepts rejection tradeoff correctly transferred concepts / turn incorrectly transferred

5 5 given this trade-off, how can we optimize the rejection threshold in a principled fashion? question

6 6 outline  current solutions  proposed approach  data  results  conclusion

7 7 current solutions  follow ASR manual [Nuance documentation]  acknowledge the tradeoff + postulate costs  “misunderstandings are X times more costly than false rejections” [Raymond et al 2004; Kawahara et al, 2000; Cuayahuitl et al, 2002]  costs are likely to differ  across domains / systems  across dialog states within a system

8 8 proposed approach  derive costs in a principled fashion 1.identify a set of variables involved in the tradeoff correctly and incorrectly transferred concepts per turn (CTC, ITC) CTC ITC 2.choose a dialog performance metric task completion (binary, kappa) – TC; 3.build a regression model logit(TC) ← C 0 + C CTC CTC + C ITC ITC 4.optimize threshold to maximize performance th* = argmax (C CTC CTC + C ITC ITC)

9 9 state-specific costs  costs are different in different dialog states  CTC and ITC on a per-state basis logit(TC) ← C 0 + C CTCstate1 CTC state1 + C ITCstate1 ITC state1 + C CTCstate2 CTC state2 + C ITCstate2 ITC state2 + C CTCstate3 CTC state3 + C ITCstate3 ITC state3 + …  optimize separate threshold for each state th state_x * = argmax (C CTCstate_x CTC state_x + C ITCstate_x ITC state_x )

10 10 outline  current solutions  proposed approach  data  results  conclusion

11 11 data  collected using RoomLine  phone-based, mixed-initiative spoken dialog system  conference room reservations  sphinx-2  utterance-level confidence annotator [0-1]  46 participants (first-time users)  10 scenario-driven interactions  corpus  449 dialog sessions  8278 user turns  manually labeled decoded concept “correctness”

12 12 roomline states  71 “dialog states” total  clustered into 3 classes  open-request How may I help you?  request(bool) Would you like a reservation for this room? Would you like a room with a projector?  request(non-bool) For what time would you like to reserve the room?

13 13 results: task success model BaselineTrainCross-Vp AVG-LL -0.4655-0.2952-0.3059< 10 -4 HARD 17.62%11.66%11.75% model predicting binary task success sepCoeffVariable 1.10460.0018-3.441ITC / request(non-bool) 0.81370.00172.5514CTC / request(non-bool) 1.30980.6491-0.5959ITC / request(bool) 1.00760.00103.3127CTC / request(bool) 0.46340.3801-0.4067ITC / open-request 0.29550.06190.5518CTC / open-request 1.15040.0416-2.3442Const cost coefficients

14 14 results: threshold optimization correctly transferred concepts per turn incorrectly transferred concepts per turn utility = 0.55 x CTC – 0.40 x ITC open-request 0 10.50.250.75 1 0.5 0 sepCoeffVariable 1.10460.0018-3.441ITC / request(non-bool) 0.81370.00172.5514CTC / request(non-bool) 1.30980.6491-0.5959ITC / request(bool) 1.00760.00103.3127CTC / request(bool) 0.46340.3801-0.4067ITC / open-request 0.29550.06190.5518CTC / open-request 1.15040.0416-2.3442Const cost coefficients

15 15 results: threshold optimization request(bool) utility = 3.31 x CTC – 0.60 x ITC 0 10.50.250.75 3 2 1 0  utility profiles are different across the three states  task duration models lead to similar results correctly transferred concepts per turn incorrectly transferred concepts per turn utility = 0.55 x CTC – 0.40 x ITC open-request 0 10.50.250.75 1 0.5 0 request(non-bool) 0 10.50.250.750.6 utility = 2.55 x CTC – 3.44 x ITC 0 1 0.5

16 16 conclusion  principled method for optimizing rejection threshold  determine costs for various types of understanding errors  data-driven approach  can derive state-specific costs  bridge mismatches between off-the-shelf confidence annotators and domain

17 17 thank you

18 18 fit for task success model

19 19 CurrentNew EstimateDelta Open-request CTC0.540.89+0.35 ITC0.160.31+0.15 Request bool CTC0.840.86+0.02 ITC0.090.12+0.03 Request non-bool CTC0.720.66-0.06 ITC0.250.17-0.08 CurrentNew EstimateDelta Task success82.75%87.16%+4.41% Remains to be seen … expected changes in task success

20 20 task duration model VariableCoeffpse Const1.27500.00000.1019 CTC / oreq-0.17690.00000.0187 ITC / oreq-0.15670.00010.0401 CTC / req(bool)-0.78650.00000.0869 ITC / req(bool)-0.63890.00000.1297 CTC / req(non-bool)-0.51270.00000.0440 ITC / req(non-bool)0.42560.00000.0851

21 21 Model 2: Resulting fit and coefficients R^2 = 0.56 intro : data collection : rejection threshold


Download ppt "A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department."

Similar presentations


Ads by Google