Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

Similar presentations


Presentation on theme: "1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one."— Presentation transcript:

1 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one of joint work with Dr J Copas

2 2 Outline Problem setting for classification overview of classification methods Dw classifications Dw divergence of discriminant functions definition from NP Lemma, expected and ovserved expressions examples of Dw logistic regression, adaboost, area under ROC curve, hit rate, credit scoring, medical screening structure of Dw risk functions optimal Dw under near-logistic implement by cross-validation Risk scores of skin cancer area under ROC curve, comparison discussion on other methods [ http://juban.ism.ac.jp/ ]http://juban.ism.ac.jp/

3 3 Standard methods Fisher linear discriminant analysis [4] Logistic regression [ Cornfield, 1962] Multilayer perception [ http://juban.ism.ac.jp/file_ppt/ 公開講座 ( ニューラ ル ).ppt] New approaches Boostimg – combining weak learners – AdaBoost [http://juban.ism.ac.jp/file_ppt/ 公開講座( Boost ).ppt] Support vector machine – VCdimension – [http://juban.ism.ac.jp/file_ppt/open-svm12-21.ppt] Kernel method – Mercer theorem – [http://juban.ism.ac.jp/file_ppt/ 主成分発表原稿.ppt]

4 4 Problem setting input vector output variable Definition is a classifier if is onto. (direct sum) the k-th decision space

5 5 Joint distribution of, y : where prior distribution conditional distribution of given y Probablistic model Misclassification error rate hit rate

6 6 discriminant function classifier Bayes rule Given P(x, y), Training data (examples) i-th input

7 7 output variable Reduction of our problem to binary classification log-likelihood ratio discriminant function classifier error rate

8 8 Other loss functions for classification Credit scoring [5] A cost model : a profit if y = 1; loss if y = 0. General setting Let be a cost of classify y as. The expected cost is

9 9 hit correct rejection false negative false positive ROC (Reciever Operating Characteristic) curve

10 10 Main story linear discriminant function Given a training data objective function proposed estimator What (U,V ) is ? Logistic is OK.

11 11 log-likelihood ratio discriminant function A reinterpretation of Neyman-Pearson Lemma Proposition Remark

12 12 Proof of Proposition

13 13 Divergence Dw of discriminant function Def. Expectation expression

14 14 Proof

15 15 Sample expression given a set of training data Minimum Dw method for a statistical model F

16 16 Examples of Dw divergence (1) logistic regression (2) Hit rate, Credit scoring, medical screening

17 17 This Dw is the loss function of AdaBoost, cf. [7], [8]. (3) Area under ROC curve (4) AdaBoost

18 18 Structure of Dw risk functions optimal Dw under near-logistic implement by cross-validation Logistic(linear)-parametric model model distribution of, y :

19 19 Estimating equation of minimum Dw methods Remark

20 20 Cauchy-Schwartz inequality Prametric assumption

21 21 Near-Parametric assumption

22 22 Our risk function of an estimator is But our situation is Let Cross + varianced Risk estimate the bias term is where variance term is where is the estimate from the training date by leaving the i th-example out.

23 23

24 24 Outlier For

25 25 Note : where

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34 References [1] Begg, C. B., Satogopan, J. M. and Berwick, M. (1998). A new strategy for evaluating the impact of epidemiologic risk factors for cancerwith applications to melanoma. J. Amer. Statist. Assoc. 93, 415-426. [2] Berwick, M, Begg, C. B., Fine, J. A., Roush, G. C. and Barnhill, R. L. (1996). Screening for cutaneous melanoma by self skin examination. J. National Cancer Inst., 88, 17-23. [3] Eguchi, S and Copas, J. (2000). A Class of Logistic-type Discriminant Functions. Technical Report of Department of Statistics, University of Warwick. [4] Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179-188. [5] Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. J. Roy. Statist. Soc., A, 160, 523-541. [6] McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley: New York. [7] Schapire R., Freund, Y., Bartlett, P. and Lee, W. S. (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26, 1651-1686. [8] Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer: New York.


Download ppt "1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one."

Similar presentations


Ads by Google