Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Pattern Recognition and Machine Learning

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

CSI :Florida A BAYESIAN APPROACH TO LOCALIZED MULTI-KERNEL LEARNING USING THE RELEVANCE VECTOR MACHINE R. Close, J. Wilson, P. Gader.

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Biointelligence Laboratory, Seoul National University

CHAPTER 10: Linear Discrimination

Pattern Recognition and Machine Learning

Support vector machine

Computer vision: models, learning and inference Chapter 8 Regression.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct

Supervised Learning Recap

Chapter 4: Linear Models for Classification

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Lecture 14 – Neural Networks

Pattern Recognition and Machine Learning

Classification and risk prediction

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.

Statistical Decision Theory, Bayes Classifier

Machine Learning CMPT 726 Simon Fraser University

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Visual Recognition Tutorial

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

An Introduction to Support Vector Machines Martin Law.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Outline Separating Hyperplanes – Separable Case

Principles of Pattern Recognition

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Kernel Classifiers from a Machine Learning Perspective (sec ) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering.

ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

An Introduction to Support Vector Machines (M. Law)

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

Optimal Bayes Classification

Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.

Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.

Biointelligence Laboratory, Seoul National University

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Ensemble Methods in Machine Learning

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Lecture 1.31 Criteria for optimal reception of radio signals.

Usman Roshan CS 675 Machine Learning

CEE 6410 Water Resources Systems Analysis

Probability Theory and Parameter Estimation I

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Computer vision: models, learning and inference

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Support Vector Machines

Mathematical Foundations of BME

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Presentation transcript:

Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005

3.3 Relevance Vector Machine ● [M.Tipping, JMLR 2001] ● Modification to Gaussian process – GP ● Prior ● Likelihood ● Posterior – RVM ● Prior ● Likelihood same as GP ● Posterior

● Reasons – To get sparce representation of – Expected risk of classifier, ● Thus, we favor weight vectors with a small number of non- zero coeffs. – One way to achieve this is to modify prior: – Consider ● Then wi=0 is only possible ● Computation of is easier than before

● Prediction funcion – GP – RVM

● How can we learn the sparce vector – To find the best, employ evidence maximizaion – The evidence is given explicitly by, – Derived update rules (App'x B.8):

● Evidence Maximization – Interestingly, many of the decrease quickly toward zero which lead to a high sparsity in – For faster convergence, delete ith column from whenever < pre-def threshold – After termination, set the corresponding = 0 for which < thres. The remaining are set equal to corresponing values in

● Application to Classification – Consider latent target variables – Training objects: – Test object: – Compute the predictive distribution of at the new object, ● by applying a latent weight vector to all the m+1 objects ● and marginalizing over all, we get

– Note – As in the case of GP, we cannot solve this analytically because is no longer Gaussian – Laplace approximaion: approx. this density by a Gaussian density w/ mean and cov.

● Kernel trick – Think about a RKHS generated by – Then ith component of training objects is represented as – Now, think about regression. The concept of becomes the expansion coeff. of the desired hyperplane, such that – In this sense, all the training objects which have non-zero are termed relevance vectors

3.4 Bayes Point Machines ● [R. Herbrich, JMLR 2000] ● In GP and RVMs, we tried to solve classification problem via regression estimation ● Before we assumed prior dist. and used logit transformations to model the likelihood distribution, ● Now we try to model it directly

● Prior – For classification, only the spatial direction of. Note that – Thus we consider only the vectors on unit sphere – Then assume a uniform prior over this ball-shaped hypothesis space

● Likelihood – Use PAC likelihood (0-1 loss) ● Posterior – Remark: using PAC likelihood,

● Predictive distribution – In two class case, the Bayesian decision can be written as: ● That is, the Bayes classification strategy performs majority voting involving all version space classifiers ● However, the expectation is hard to solve ● Hence we approximate it by a single classifier

– That is, BP is the optimal projection of the Bayes classifiers to a single classifier w.r.t. generalization error – However this also is intractable because we need to know input distribution and posterior – Another reasonable approximation:

● Now the Bayes classification of new object equals to the classification w.r.t. the single weight vector ● Estimate by MCMC sampling (‘kernel billiard algorithm’)