Biointelligence Laboratory, Seoul National University

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Introduction to Support Vector Machines (SVM)
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning
Protein Fold Recognition with Relevance Vector Machines Patrick Fernie COMS 6772 Advanced Machine Learning 12/05/2005.
Pattern Recognition and Machine Learning
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Separating Hyperplanes
Linear Models for Classification: Probabilistic Methods
Chapter 4: Linear Models for Classification
Computer vision: models, learning and inference
Pattern Recognition and Machine Learning
Support Vector Machines (and Kernel Methods in general)
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
SVM for Regression DMML Lab 04/20/07. SVM Recall Two-class classification problem using linear model:
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
An Introduction to Support Vector Machines Martin Law.
Biointelligence Laboratory, Seoul National University
Summarized by Soo-Jin Kim
SVM by Sequential Minimal Optimization (SMO)
Ch 6. Kernel Methods Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J. S. Kim Biointelligence Laboratory, Seoul National University.
Biointelligence Laboratory, Seoul National University
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
An Introduction to Support Vector Machines (M. Law)
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support vector machines
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Sparse Kernel Machines
Ch 14. Combining Models Pattern Recognition and Machine Learning, C. M
LECTURE 16: SUPPORT VECTOR MACHINES
Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
Biointelligence Laboratory, Seoul National University
LECTURE 17: SUPPORT VECTOR MACHINES
Support vector machines
Biointelligence Laboratory, Seoul National University
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Presentation transcript:

Biointelligence Laboratory, Seoul National University Ch 7. Sparse Kernel Machines Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by S. Kim Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Contents Maximum Margin Classifiers Overlapping Class Distributions Relation to Logistic Regression Multiclass SVMs SVMs for Regression Relevance Vector Machines RVM for Regression Analysis of Sparsity RVMs for Classification (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Maximum Margin Classifiers Problem settings Two-class classification using linear models Assume that training data set is linearly separable Support vector machine approaches The decision boundary is chosen to be the one for which the margin is maximized support vectors (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Maximum Margin Solution For all data points, The distance of a point to the decision surface The maximum margin solution (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Dual Representation Introducing Lagrange multipliers, Min. points satisfy the derivatives of L w.r.t. w and b equal 0 Dual representation Find Appendix E for more details (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Classifying New Data Optimization subjects to Found by solving a quadratic programming problem Karush-Kuhn-Tucker (KKT) Conditions  Appendix E or : support vectors O(N3) (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Example of Separable Data Classification Figure 7.2 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Overlapping Class Distributions Allow some misclassified examples  soft margin Introduce slack variables (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Soft Margin Solution Minimize KKT conditions: : trade-off between minimizing training errors and controlling model complexity or (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ : support vectors

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Dual Representation Dual representation Classifying new data and obtaining b ( hard margin classifiers) (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Alternative Formulation v-SVM (Schölkopf et al., 2000) - Upper bound on the fraction of margin errors - Lower bound on the fraction of support vectors (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Example of Nonseparable Data Classification (v-SVM) (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Solutions of the QP Problem Chunking (Vapnik, 1982) Idea: the value of Lagrangian is unchanged if we remove the rows and columns of the kernel matrix corresponding to Lagrange multipliers that have value zero Protected conjugate gradients (Burges, 1998) Decomposition methods (Osuna et al., 1996) Sequential minimal optimization (Platt, 1999) (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Relation to Logistic Regression (Section 4.3.2) For data points on the correct side, For the remaining points, : hinge error function (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Relation to Logistic Regression (Cont’d) From maximum likelihood logistic regression Error function with a quadratic regularizer (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Comparison of Error Functions Hinge error function Error function for logistic regression Misclassification error Squared error (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Multiclass SVMs One-versus-the-rest: K separate SVMs Can lead inconsistent results (Figure 4.2) Imbalanced training sets Positive class: +1, negative class: -1/(K-1) (Lee et al., 2001) An objective function for training all SVMs simultaneously (Weston and Watkins, 1999) One-versus-one: K(K-1)/2 SVMs Based on error-correcting output codes (Allwein et al., 2000) Generalization of the voting scheme of the one-versus-one (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ SVMs for Regression Simple linear regression: minimize ε-insensitive error function ε-insensitive error function quadratic error function (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

SVMs for Regression (Cont’d) Minimize (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Dual Problem (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Predictions KKT conditions: (from derivatives of the Lagrangian) (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Alternative Formulation v-SVM (Schölkopf et al., 2000) fraction of points lying outside the tube (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Example of v-SVM Regression (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Relevance Vector Machines SVM Outputs are decisions rather than posterior probabilities The extension to K>2 classes is problematic There is a complexity parameter C Kernel functions are centered on training data points and required to be positive definite RVM Bayesian sparse kernel technique Much sparser models Faster performance on test data (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ RVM for Regression RVM is a linear form in Chapter 3 with a modified prior (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

RVM for Regression (Cont’d) From the result (3.49) for linear regression models α and β are determined using evidence approximation (type-2 maximum likelihood) (Section 3.5) Maximize (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

RVM for Regression (Cont’d) Two approaches By derivatives of marginal likelihood EM algorithm  Section 9.3.4 Predictive distribution Section 3.3.2 (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Example of RVM Regression More compact than SVM Parameters are determined automatically Require more training time than SVM RVM regression v-SVM regression (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Mechanism for Sparsity only isotropic noise, α = ∞ a finite value of α (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Sparse Solution Pull out the contribution from αi in Using (C.7), (C.15) in Appendix C (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sparse Solution (Cont’d) For log marginal likelihood function L, Stationary points of the marginal likelihood w.r.t.αi Sparsity: measures the extent to which overlaps with the other basis vectors Quality of : represents a measure of the alignment of the basis vector with the error between t and y-i (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequential Sparse Bayesian Learning Algorithm Initialize Initialize using , with , with the remaining Evaluate and for all basis functions Select a candidate If ( is already in the model), update If , add to the model, and evaluate If , remove from the model, and set Update Go to 3 until converged (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

RVM for Classification Probabilistic linear classification model (Chapter 4) with ARD prior - Initialize - Build a Gaussian approximation to the posterior distribution - Obtain an approximation to the marginal likelihood - Maximize the marginal likelihood (re-estimate ) until converged (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

RVM for Classification (Cont’d) The posterior distribution is obtained by maximizing Iterative reweighted least squares (IRLS) from Section 4.3.3 Resulting Gaussian approximation to the posterior distribution (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

RVM for Classification (Cont’d) Marginal likelihood using Laplace approximation (Section 4.4) Set the derivative of the marginal likelihood equal to zero, and rearranging then gives If we define , Same in the regression case (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Example of RVM Classification (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/