Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005.

Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005

3.3 Relevance Vector Machine ● [M.Tipping, JMLR 2001] ● Modification to Gaussian process – GP ● Prior ● Likelihood ● Posterior – RVM ● Prior ● Likelihood same as GP ● Posterior

● Reasons – To get sparce representation of – Expected risk of classifier, ● Thus, we favor weight vectors with a small number of non- zero coeffs. – One way to achieve this is to modify prior: – Consider ● Then wi=0 is only possible ● Computation of is easier than before

● Prediction funcion – GP – RVM

● How can we learn the sparce vector – To find the best, employ evidence maximizaion – The evidence is given explicitly by, – Derived update rules (App'x B.8):

● Evidence Maximization – Interestingly, many of the decrease quickly toward zero which lead to a high sparsity in – For faster convergence, delete ith column from whenever < pre-def threshold – After termination, set the corresponding = 0 for which < thres. The remaining are set equal to corresponing values in

● Application to Classification – Consider latent target variables – Training objects: – Test object: – Compute the predictive distribution of at the new object, ● by applying a latent weight vector to all the m+1 objects ● and marginalizing over all, we get

– Note – As in the case of GP, we cannot solve this analytically because is no longer Gaussian – Laplace approximaion: approx. this density by a Gaussian density w/ mean and cov.

● Kernel trick – Think about a RKHS generated by – Then ith component of training objects is represented as – Now, think about regression. The concept of becomes the expansion coeff. of the desired hyperplane, such that – In this sense, all the training objects which have non-zero are termed relevance vectors

3.4 Bayes Point Machines ● [R. Herbrich, JMLR 2000] ● In GP and RVMs, we tried to solve classification problem via regression estimation ● Before we assumed prior dist. and used logit transformations to model the likelihood distribution, ● Now we try to model it directly

● Prior – For classification, only the spatial direction of. Note that – Thus we consider only the vectors on unit sphere – Then assume a uniform prior over this ball-shaped hypothesis space

● Likelihood – Use PAC likelihood (0-1 loss) ● Posterior – Remark: using PAC likelihood,

● Predictive distribution – In two class case, the Bayesian decision can be written as: ● That is, the Bayes classification strategy performs majority voting involving all version space classifiers ● However, the expectation is hard to solve ● Hence we approximate it by a single classifier

– That is, BP is the optimal projection of the Bayes classifiers to a single classifier w.r.t. generalization error – However this also is intractable because we need to know input distribution and posterior – Another reasonable approximation:

● Now the Bayes classification of new object equals to the classification w.r.t. the single weight vector ● Estimate by MCMC sampling (‘kernel billiard algorithm’)

Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005.

Similar presentations

Presentation on theme: "Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005.

Similar presentations

Presentation on theme: "Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005."— Presentation transcript:

Similar presentations

About project

Feedback