Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kernel Classifiers from a Machine Learning Perspective (sec. 2.1- 2.2) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering.

Similar presentations


Presentation on theme: "Kernel Classifiers from a Machine Learning Perspective (sec. 2.1- 2.2) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering."— Presentation transcript:

1 Kernel Classifiers from a Machine Learning Perspective (sec. 2.1- 2.2) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering Seoul National University

2 (C) 2005, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 2.1 The Basic Setting Definition 2.1 (Learning problem) finding the unknown (functional) relationship h between objects x and targets y based on a sample z of size m. For a given object x, evaluate the distribution and decide on the class y by Problems of estimating P Z based on the given sample.  Cannot predict classes for a new object  Need to constrain the set of possible mappings from objects to classes Definition 2.2 (Features and feature space) Definition 2.4 (Linear function and linear classifier)  Similar objects are mapped to similar classes via linearity  Linear classifier is unaffected by the scale of weight (and hence the weight is assumed to be of unit length.)

3 (C) 2005, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ F is isomorphic to W  The task of learning reduces to finding the best classifier in the hypothesis space F Properties for the goodness of a classifier:  Dependent on the unknown P Z  Make the maximization task computationally easier  Pointwise wrt. the object-class pairs (due to the independence of samplings ) Expected risk: Example 2.7 (Cost matrices) In classifying handwritten digits, 0-1 loss function is inappropriate. (There are approximately 10 times more “no pictures of 1” than “pictures of 1”.)

4 (C) 2005, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Remark 2.8 (Geometrical picture) Linear classifiers, parameterized by weight w, are hyperplanes passing through the origin in feature space K. (Hypothesis space) (Feature space)

5 (C) 2005, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 2.2 Learning by Risk Minimization Definition 2.9 (Learning algorithm) a mapping A such that (where X: object space, Y:output space, F: set of feature mappings) - we have no knowledge of the function (or P Z ) to be optimized Definition 2.10 (Generalization error) Definition 2.11 (Empirical Risk) the empirical risk functional over F or training error of f is defined as

6 (C) 2005, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 2.2.1 The (Primal) Perceptron Algorithm When is misclassified by the linear classifier, the update step amounts changing into and thus attracts the hyperplane.

7 (C) 2005, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Definition 2.12 (Version Space) The set of all classifiers consistent with the training sample. Given the training sample and a hypothesis space,  For linear classifiers, the set of consistent weights is called a version space:

8 (C) 2005, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 2.2.2 Regularized Risk Functional Drawbacks of minimizing empirical risk  ERM makes the learning task an ill-posed one. (a slight variation of training sample makes large deviation of expected risks, overfitting) Regularization is one way to overcome this problem  Introduce a regularizer a-priori  To restrict the space of solutions to be compact subsets of the (originally overly large space) F.  This can be achieved by requiring the set to be compact for each positive number  If we decrease for increasing sample size in the right way, it can be shown that the regularization method leads to as  minimize only the empirical risk, minimizes only with regularizer

9 (C) 2005, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Structural risk minimization (SRM) by Vapnik:  Define a structuring of the hypothesis space F into nested subsets of increasing complexity.  In each hypothesis space, empirical minimization is preformed  SRM returns the classifier with the smallest risk and can be used with complexity values. Maximum-a-posteriori (MAP) estimation: empirical risk as the negative log-probability of the training sample z, for a classifier f.  MAP estimate maximized the mode of the posterior densitiy  The choice of regularizer is comparable to the choice of prior probability in the Bayesian framework and reflects prior knowledge.

10 (C) 2005, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/


Download ppt "Kernel Classifiers from a Machine Learning Perspective (sec. 2.1- 2.2) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering."

Similar presentations


Ads by Google