Download presentation
Presentation is loading. Please wait.
1
a principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217
2
2 understanding errors and rejection systems often misunderstand use confidence scores common design pattern compare input confidence against a threshold reject utterance if confidence is too low may lead to false rejections
3
3 010.50.750.25 rejection threshold 0% 25% 50% 75% misunderstandings vs. false rejections rejection tradeoff misunderstandings false rejections
4
4 010.50.750.25 rejection threshold misunderstandings vs. false rejections correctly vs. incorrectly transferred concepts rejection tradeoff correctly transferred concepts / turn incorrectly transferred
5
5 given this trade-off, how can we optimize the rejection threshold in a principled fashion? question
6
6 outline current solutions proposed approach data results conclusion
7
7 current solutions follow ASR manual [Nuance documentation] acknowledge the tradeoff + postulate costs “misunderstandings are X times more costly than false rejections” [Raymond et al 2004; Kawahara et al, 2000; Cuayahuitl et al, 2002] costs are likely to differ across domains / systems across dialog states within a system
8
8 proposed approach derive costs in a principled fashion 1.identify a set of variables involved in the tradeoff correctly and incorrectly transferred concepts per turn (CTC, ITC) CTC ITC 2.choose a dialog performance metric task completion (binary, kappa) – TC; 3.build a regression model logit(TC) ← C 0 + C CTC CTC + C ITC ITC 4.optimize threshold to maximize performance th* = argmax (C CTC CTC + C ITC ITC)
9
9 state-specific costs costs are different in different dialog states CTC and ITC on a per-state basis logit(TC) ← C 0 + C CTCstate1 CTC state1 + C ITCstate1 ITC state1 + C CTCstate2 CTC state2 + C ITCstate2 ITC state2 + C CTCstate3 CTC state3 + C ITCstate3 ITC state3 + … optimize separate threshold for each state th state_x * = argmax (C CTCstate_x CTC state_x + C ITCstate_x ITC state_x )
10
10 outline current solutions proposed approach data results conclusion
11
11 data collected using RoomLine phone-based, mixed-initiative spoken dialog system conference room reservations sphinx-2 utterance-level confidence annotator [0-1] 46 participants (first-time users) 10 scenario-driven interactions corpus 449 dialog sessions 8278 user turns manually labeled decoded concept “correctness”
12
12 roomline states 71 “dialog states” total clustered into 3 classes open-request How may I help you? request(bool) Would you like a reservation for this room? Would you like a room with a projector? request(non-bool) For what time would you like to reserve the room?
13
13 results: task success model BaselineTrainCross-Vp AVG-LL -0.4655-0.2952-0.3059< 10 -4 HARD 17.62%11.66%11.75% model predicting binary task success sepCoeffVariable 1.10460.0018-3.441ITC / request(non-bool) 0.81370.00172.5514CTC / request(non-bool) 1.30980.6491-0.5959ITC / request(bool) 1.00760.00103.3127CTC / request(bool) 0.46340.3801-0.4067ITC / open-request 0.29550.06190.5518CTC / open-request 1.15040.0416-2.3442Const cost coefficients
14
14 results: threshold optimization correctly transferred concepts per turn incorrectly transferred concepts per turn utility = 0.55 x CTC – 0.40 x ITC open-request 0 10.50.250.75 1 0.5 0 sepCoeffVariable 1.10460.0018-3.441ITC / request(non-bool) 0.81370.00172.5514CTC / request(non-bool) 1.30980.6491-0.5959ITC / request(bool) 1.00760.00103.3127CTC / request(bool) 0.46340.3801-0.4067ITC / open-request 0.29550.06190.5518CTC / open-request 1.15040.0416-2.3442Const cost coefficients
15
15 results: threshold optimization request(bool) utility = 3.31 x CTC – 0.60 x ITC 0 10.50.250.75 3 2 1 0 utility profiles are different across the three states task duration models lead to similar results correctly transferred concepts per turn incorrectly transferred concepts per turn utility = 0.55 x CTC – 0.40 x ITC open-request 0 10.50.250.75 1 0.5 0 request(non-bool) 0 10.50.250.750.6 utility = 2.55 x CTC – 3.44 x ITC 0 1 0.5
16
16 conclusion principled method for optimizing rejection threshold determine costs for various types of understanding errors data-driven approach can derive state-specific costs bridge mismatches between off-the-shelf confidence annotators and domain
17
17 thank you
18
18 fit for task success model
19
19 CurrentNew EstimateDelta Open-request CTC0.540.89+0.35 ITC0.160.31+0.15 Request bool CTC0.840.86+0.02 ITC0.090.12+0.03 Request non-bool CTC0.720.66-0.06 ITC0.250.17-0.08 CurrentNew EstimateDelta Task success82.75%87.16%+4.41% Remains to be seen … expected changes in task success
20
20 task duration model VariableCoeffpse Const1.27500.00000.1019 CTC / oreq-0.17690.00000.0187 ITC / oreq-0.15670.00010.0401 CTC / req(bool)-0.78650.00000.0869 ITC / req(bool)-0.63890.00000.1297 CTC / req(non-bool)-0.51270.00000.0440 ITC / req(non-bool)0.42560.00000.0851
21
21 Model 2: Resulting fit and coefficients R^2 = 0.56 intro : data collection : rejection threshold
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.