Download presentation
Presentation is loading. Please wait.
Published byBenedict King Modified over 9 years ago
1
1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one of joint work with Dr J Copas
2
2 Outline Problem setting for classification overview of classification methods Dw classifications Dw divergence of discriminant functions definition from NP Lemma, expected and ovserved expressions examples of Dw logistic regression, adaboost, area under ROC curve, hit rate, credit scoring, medical screening structure of Dw risk functions optimal Dw under near-logistic implement by cross-validation Risk scores of skin cancer area under ROC curve, comparison discussion on other methods [ http://juban.ism.ac.jp/ ]http://juban.ism.ac.jp/
3
3 Standard methods Fisher linear discriminant analysis [4] Logistic regression [ Cornfield, 1962] Multilayer perception [ http://juban.ism.ac.jp/file_ppt/ 公開講座 ( ニューラ ル ).ppt] New approaches Boostimg – combining weak learners – AdaBoost [http://juban.ism.ac.jp/file_ppt/ 公開講座( Boost ).ppt] Support vector machine – VCdimension – [http://juban.ism.ac.jp/file_ppt/open-svm12-21.ppt] Kernel method – Mercer theorem – [http://juban.ism.ac.jp/file_ppt/ 主成分発表原稿.ppt]
4
4 Problem setting input vector output variable Definition is a classifier if is onto. (direct sum) the k-th decision space
5
5 Joint distribution of, y : where prior distribution conditional distribution of given y Probablistic model Misclassification error rate hit rate
6
6 discriminant function classifier Bayes rule Given P(x, y), Training data (examples) i-th input
7
7 output variable Reduction of our problem to binary classification log-likelihood ratio discriminant function classifier error rate
8
8 Other loss functions for classification Credit scoring [5] A cost model : a profit if y = 1; loss if y = 0. General setting Let be a cost of classify y as. The expected cost is
9
9 hit correct rejection false negative false positive ROC (Reciever Operating Characteristic) curve
10
10 Main story linear discriminant function Given a training data objective function proposed estimator What (U,V ) is ? Logistic is OK.
11
11 log-likelihood ratio discriminant function A reinterpretation of Neyman-Pearson Lemma Proposition Remark
12
12 Proof of Proposition
13
13 Divergence Dw of discriminant function Def. Expectation expression
14
14 Proof
15
15 Sample expression given a set of training data Minimum Dw method for a statistical model F
16
16 Examples of Dw divergence (1) logistic regression (2) Hit rate, Credit scoring, medical screening
17
17 This Dw is the loss function of AdaBoost, cf. [7], [8]. (3) Area under ROC curve (4) AdaBoost
18
18 Structure of Dw risk functions optimal Dw under near-logistic implement by cross-validation Logistic(linear)-parametric model model distribution of, y :
19
19 Estimating equation of minimum Dw methods Remark
20
20 Cauchy-Schwartz inequality Prametric assumption
21
21 Near-Parametric assumption
22
22 Our risk function of an estimator is But our situation is Let Cross + varianced Risk estimate the bias term is where variance term is where is the estimate from the training date by leaving the i th-example out.
23
23
24
24 Outlier For
25
25 Note : where
26
26
27
27
28
28
29
29
30
30
31
31
32
32
33
33
34
34 References [1] Begg, C. B., Satogopan, J. M. and Berwick, M. (1998). A new strategy for evaluating the impact of epidemiologic risk factors for cancerwith applications to melanoma. J. Amer. Statist. Assoc. 93, 415-426. [2] Berwick, M, Begg, C. B., Fine, J. A., Roush, G. C. and Barnhill, R. L. (1996). Screening for cutaneous melanoma by self skin examination. J. National Cancer Inst., 88, 17-23. [3] Eguchi, S and Copas, J. (2000). A Class of Logistic-type Discriminant Functions. Technical Report of Department of Statistics, University of Warwick. [4] Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179-188. [5] Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. J. Roy. Statist. Soc., A, 160, 523-541. [6] McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley: New York. [7] Schapire R., Freund, Y., Bartlett, P. and Lee, W. S. (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26, 1651-1686. [8] Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer: New York.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.