1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one of joint work with Dr J Copas

2 Outline Problem setting for classification overview of classification methods Dw classifications Dw divergence of discriminant functions definition from NP Lemma, expected and ovserved expressions examples of Dw logistic regression, adaboost, area under ROC curve, hit rate, credit scoring, medical screening structure of Dw risk functions optimal Dw under near-logistic implement by cross-validation Risk scores of skin cancer area under ROC curve, comparison discussion on other methods [ http://juban.ism.ac.jp/ ]http://juban.ism.ac.jp/

3 Standard methods Fisher linear discriminant analysis [4] Logistic regression [ Cornfield, 1962] Multilayer perception [ http://juban.ism.ac.jp/file_ppt/ 公開講座 ( ニューラル ).ppt] New approaches Boostimg – combining weak learners – AdaBoost [http://juban.ism.ac.jp/file_ppt/ 公開講座（ Boost ）.ppt] Support vector machine – VCdimension – [http://juban.ism.ac.jp/file_ppt/open-svm12-21.ppt] Kernel method – Mercer theorem – [http://juban.ism.ac.jp/file_ppt/ 主成分発表原稿.ppt]

4 Problem setting input vector output variable Definition is a classifier if is onto. (direct sum) the k-th decision space

5 Joint distribution of, y : where prior distribution conditional distribution of given y Probablistic model Misclassification error rate hit rate

6 discriminant function classifier Bayes rule Given P(x, y), Training data (examples) i-th input

7 output variable Reduction of our problem to binary classification log-likelihood ratio discriminant function classifier error rate

8 Other loss functions for classification Credit scoring [5] A cost model : a profit if y = 1; loss if y = 0. General setting Let be a cost of classify y as. The expected cost is

9 hit correct rejection false negative false positive ROC (Reciever Operating Characteristic) curve

10 Main story linear discriminant function Given a training data objective function proposed estimator What (U,V ) is ? Logistic is OK.

11 log-likelihood ratio discriminant function A reinterpretation of Neyman-Pearson Lemma Proposition Remark

12 Proof of Proposition

13 Divergence Dw of discriminant function Def. Expectation expression

14 Proof

15 Sample expression given a set of training data Minimum Dw method for a statistical model F

16 Examples of Dw divergence (1) logistic regression (2) Hit rate, Credit scoring, medical screening

17 This Dw is the loss function of AdaBoost, cf. [7], [8]. (3) Area under ROC curve (4) AdaBoost

18 Structure of Dw risk functions optimal Dw under near-logistic implement by cross-validation Logistic(linear)-parametric model model distribution of, y :

19 Estimating equation of minimum Dw methods Remark

20 Cauchy-Schwartz inequality Prametric assumption

21 Near-Parametric assumption

22 Our risk function of an estimator is But our situation is Let Cross + varianced Risk estimate the bias term is where variance term is where is the estimate from the training date by leaving the i th-example out.

24 Outlier For

25 Note : where

34 References [1] Begg, C. B., Satogopan, J. M. and Berwick, M. (1998). A new strategy for evaluating the impact of epidemiologic risk factors for cancerwith applications to melanoma. J. Amer. Statist. Assoc. 93, 415-426. [2] Berwick, M, Begg, C. B., Fine, J. A., Roush, G. C. and Barnhill, R. L. (1996). Screening for cutaneous melanoma by self skin examination. J. National Cancer Inst., 88, 17-23. [3] Eguchi, S and Copas, J. (2000). A Class of Logistic-type Discriminant Functions. Technical Report of Department of Statistics, University of Warwick. [4] Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179-188. [5] Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. J. Roy. Statist. Soc., A, 160, 523-541. [6] McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley: New York. [7] Schapire R., Freund, Y., Bartlett, P. and Lee, W. S. (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26, 1651-1686. [8] Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer: New York.

1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

Similar presentations

Presentation on theme: "1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

Similar presentations

Presentation on theme: "1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one."— Presentation transcript:

Similar presentations

About project

Feedback