Download presentation
Presentation is loading. Please wait.
1
1 A Confidence Interval for the Misclassification Rate S.A. Murphy & E.B. Laber
2
2 Outline –Review –Three challenges in constructing CIs –Combining a statistical approach with a learning theory approach to constructing CIs –Relevance to confidence measures for the value of a policy.
3
3 Review –X is the vector of features, Y is the binary classification –Misclassification Rate: –Data: N iid observations of (Y,X) –Given a space of classifiers,, and the data, use some method to construct a classifier, –The goal is to provide a CI for
4
4 Review –Since the loss function is not smooth, one commonly uses a surrogate loss to estimate the classifier –Surrogate Loss: L(Y,f(X)) –
5
5 Review General approach to providing a CI: –We estimate using the data, resulting in –Derive approximate distribution for –Use this approximate distribution to construct a confidence interval for
6
6 Three challenges 1) is too large leading to over-fitting and (negative bias) 2) is a non-smooth function. 3) may behave like an extreme quantity No assumption that is close to optimal.
7
7 Three challenges 2) is non-smooth. Example: The unknown Bayes classifier has quadratic decision boundary. We fit, by least squares, a linear decision boundary f(x)= sign(β 0 + β 1 x)
8
8 Density of
9
9 Bootstrap CI for
10
10 Misclassification Rate is Non-smooth Coverage of 95% CI Sample Size Bootstrap Percentile Yang CV CUD- Bound 30.85.29.91 50.88.24.92 100.83.20.94 200.85.22.95
11
11 CIs for Extreme Quantities 3) may behave like an extreme quantity Should this be problematic? –Highly skewed distribution of –Fast convergence of to zero
12
12 CIs from Learning Theory Given a result of the form where is known to belong to and forms a 1-δ CI as
13
13 Combine statistical ideas with learning theory ideas Construct a confidence interval for where is chosen to be small yet contain ---from this CI deduce a conservative CI for ---use the surrogate loss to smooth the maximization and to construct
14
14 Construct a confidence interval for --- should contain all that are close to --- all f for which --- is the “limiting value” of ;
15
15 Confidence Interval Construct a confidence interval for --- is a rate; I would like ---
16
16 Confidence Interval
17
17 Bootstrap We use bootstrap to obtain an estimate of an upper percentile of the distribution of to obtain b *. The CI is then
18
18 Implementation Approximation space for the classifier is linear: Surrogate loss is least squares: is the.632 estimator
19
19 Implementation becomes
20
20
21
21 Computational Issues Partition R p into equivalence classes defined by the pattern of signs Each equivalence class, can be written as a set of β satisfying linear constraints. The term in absolute values is constant on
22
22 Computational Issues can be written as since g is non-decreasing.
23
23 Computational Issues Reduced the problem to the computation of a number of convex optimization problems. The number of convex optimizations is reduced via use of g with a branch and bound algorithm. With a sample size of N = 150 and 11 features calculation of the percentiles of the CUD bound using 500 bootstrap samples can be accomplished in a few minutes on a standard desktop (2.4 GHz processor 2 GB RAM).
24
24 Comparisons, 95% CI DataCUDBSM Y Spam1.0.99.631.0 Ion.96.80.99 Heart1.0.99.951.0 Diabetes1.0.91.98.99 Donut.98.90.62.88 Outlier.99.80.93 Sample size = 50
25
25 Comparisons, length of CI DataCUDBSM Y Spam.56.38.25.33 Ion.34.36.24.32 Heart.47.40.44 Diabetes.39.31.36 Donut.49.53.26.33 Outlier.50.39.29.33 Sample size=50
26
26 Intuition If then we are approximating the distribution of where
27
27 Intuition If and then the distribution is approximately that of the absolute value of a (limiting distribution for binomial, as expected).
28
28 Intuition If and the distribution is approximately the distribution of
29
29 Intuition Consider if in place of we put where is close to then due to the non-smoothness in at we will get jittering.
30
30 Discussion Further reduce the conservatism of the CUD- bound. –Eliminate symmetry of CUD-bound CI –? Trade off computational burden versus bias by use of a surrogate for the indicator in the misclassification rate The real goal is to produce CIs for the Value of a policy.
31
31 The simplest Dynamic treatment regime (e.g. policy) is a decision rule if there is only one stage of treatment 1 Stage for each individual Observation available at j th stage Action at j th stage (usually a treatment) Primary Outcome:
32
32 Goal : Construct decision rules that input patient information and output a recommended action; these decision rules should lead to a maximal mean Y. In future one selects action:
33
33 Single Stage (k=1) Find a confidence interval for the mean outcome if a particular estimated policy (here one decision rule) is employed. Action A is randomized in {-1,1}. Suppose the decision rule is of form We do not assume the optimal decision boundary is linear.
34
34 Single Stage (k=1) Mean outcome following this policy is is the randomization probability
35
35
36
36 Oslin ExTENd Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early Trigger for Nonresponse Random assignment: Naltrexone 8 wks Response Random assignment: CBI +Naltrexone CBI TDM + Naltrexone Naltrexone Nonresponse
37
37 This seminar can be found at: http://www.stat.lsa.umich.edu/~samurphy/ seminars/Stanford04.01.08.ppt Email Eric or me with questions or if you would like a copy of the associated paper: laber@umich.edu or samurphy@umich.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.