Christopher M. Bishop, Pattern Recognition and Machine Learning.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Introduction to Support Vector Machines (SVM)
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
CSI :Florida A BAYESIAN APPROACH TO LOCALIZED MULTI-KERNEL LEARNING USING THE RELEVANCE VECTOR MACHINE R. Close, J. Wilson, P. Gader.
Lecture 9 Support Vector Machines
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Machine learning continued Image source:
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Support Vector Machine
Pattern Recognition and Machine Learning
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Support Vector Machines Kernel Machines
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Support Vector Machines
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Efficient Model Selection for Support Vector Machines
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Biointelligence Laboratory, Seoul National University
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
3. Linear Models for Regression 後半 東京大学大学院 学際情報学府 中川研究室 星野 綾子.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PREDICT 422: Practical Machine Learning
CEE 6410 Water Resources Systems Analysis
Sparse Kernel Machines
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Computer vision: models, learning and inference
LECTURE 16: SUPPORT VECTOR MACHINES
Support Vector Machines
LECTURE 17: SUPPORT VECTOR MACHINES
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
SVMs for Document Ranking
Presentation transcript:

Christopher M. Bishop, Pattern Recognition and Machine Learning

Outline IIntroduction to kernel methods SSupport vector machines (SVM) RRelevance vector machines (RVM) AApplications CConclusions 2

Supervised Learning  In machine learning, applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are called supervised learning y(x) (x,t) (1,60,pass) (2,53,fail) (3,77,pass) (4,34,fail) ﹕ output 3

Classification x2x2 x1x1 y=0 y>0 y<0 t=-1 t=1 4

Regression 01 x t 0 1 new x 5

Linear Models  Linear models for regression and classification: if we apply feature extraction, input model parameter 6

Problems with Feature Space  Why feature extraction? Working in high dimensional feature spaces solves the problem of expressing complex functions  Problems: - there is a computational problem (working with very large vectors) - curse of dimensionality 7

Kernel Methods (1)  Kernel function: inner products in some feature space  nonlinear similarity measure  Examples - polynomial: - Gaussian: 8

Kernel Methods (2)   Many linear models can be reformulated using a “dual representation” where the kernel functions arise naturally  only require inner products between data (input) 9

Kernel Methods (3)  We can benefit from the kernel trick: - choosing a kernel function is equivalent to choosing φ  no need to specify what features are being used - We can save computation by not explicitly mapping the data to feature space, but just working out the inner product in the data space 10

Kernel Methods (4)  Kernel methods exploit information about the inner products between data items  We can construct kernels indirectly by choosing a feature space mapping φ, or directly choose a valid kernel function  If a bad kernel function is chosen, it will map to a space with many irrelevant features, so we need some prior knowledge of the target 11

Kernel Methods (5)  Two basic modules for kernel methods General purpose learning model Problem specific kernel function 12

Kernel Methods (6)  Limitation: the kernel function k(x n,x m ) must be evaluated for all possible pairs x n and x m of training points when making predictions for new data points  Sparse kernel machine makes prediction only by a subset of the training data points 13

Outline IIntroduction to kernel methods SSupport vector machines (SVM) RRelevance vector machines (RVM) AApplications CConclusions 14

Support Vector Machines (1)  Support Vector Machines are a system for efficiently training the linear machines in the kernel-induced feature spaces while respecting the insights provided by the generalization theory and exploiting the optimization theory  Generalization theory describes how to control the learning machines to prevent them from overfitting 15

Support Vector Machines (2)  To avoid overfitting, SVM modify the error function to a “regularized form” where hyperparameter λ balances the trade-off  The aim of E W is to limit the estimated functions to smooth functions  As a side effect, SVM obtain a sparse model 16

Support Vector Machines (3) 17 Fig. 1 Architecture of SVM

SVM for Classification (1)  The mechanism to prevent overfitting in classification is “maximum margin classifiers”  SVM is fundamentally a two-class classifier 18

Maximum Margin Classifiers (1)  The aim of classification is to find a D-1 dimension hyperplane to classify data in a D dimension space  2D example: 19

Maximum Margin Classifiers (2) margin support vectors 20

Maximum Margin Classifiers (3) small marginlarge margin 21

Maximum Margin Classifiers (4)  Intuitively it is a “robust” solution - If we’ve made a small error in the location of the boundary, this gives us least chance of causing a misclassification  The concept of max margin is usually justified using Vapnik’s Statistical learning theory  Empirically it works well 22

SVM for Classification (2)  After the optimization process, we obtain the prediction model: where (x n,t n ) are N training data we can find that an will be zero except for that of the support vectors  sparse 23

SVM for Classification (3) 24 Fig. 2 data from twp classes in two dimensions showing contours of constant y(x) obtained from a SVM having a Gaussian kernel function

SVM for Classification (4)  For overlapping class distributions, SVM allow some of the training points to be misclassified  soft margin 25 penalty

SVM for Classification (5)  For multiclass problems, there are some methods to combine multiple two-class SVMs - one versus the rest - one versus one  more training time 26 Fig. 3 Problems in multiclass classification using multiple SVMs

SVM for Regression (1)  For regression problems, the mechanism to prevent overfitting is “ε-insensitive error function” 27 quadratic error function ε-insensitive error funciton

SVM for Regression (2) 28 Fig. 4 ε-tube No error × Error = |y(x)-t|- ε

SVM for Regression (3)  After the optimization process, we obtain the prediction model: we can find that an will be zero except for that of the support vectors  sparse 29

SVM for Regression (4) 30 Fig. 5 Regression results. Support vectors are line on the boundary of the tube or outside the tube

Disadvantages  It’s not sparse enough since the number of support vectors required typically grows linearly with the size of the training set  Predictions are not probabilistic  The estimation of error/margin trade-off parameters must utilize cross-validation which is a waste of computation  Kernel functions are limited  Multiclass classification problems 31

Outline IIntroduction to kernel methods SSupport vector machines (SVM) RRelevance vector machines (RVM) AApplications CConclusions 32

Relevance Vector Machines (1)  The relevance vector machine (RVM) is a Bayesian sparse kernel technique that shares many of the characteristics of SVM whilst avoiding its principal limitations  RVM are based on Bayesian formulation and provides posterior probabilistic outputs, as well as having much sparser solutions than SVM 33

Relevance Vector Machines (2)  RVM intend to mirror the structure of the SVM and use a Bayesian treatment to remove the limitations of SVM the kernel functions are simply treated as basis functions, rather than dot-product in some space 34

Bayesian Inference  Bayesian inference allows one to model uncertainty about the world and outcomes of interest by combining common-sense knowledge and observational evidence. 35

Relevance Vector Machines (3)  In the Bayesian framework, we use a prior distribution over w to avoid overfitting where α is a hyperparameter which control the model parameter w 36

Relevance Vector Machines (4)  Goal: find most probable α* and β* to compute the predictive distribution over t new for a new input x new, i.e. p(t new | x new, X, t, α*, β*)  Maximize the likelihood function to obtain α* and β* : p(t|X, α, β) 37 Training data and their target values

Relevance Vector Machines (5)  RVM utilize the “automatic relevance determination” to achieve sparsity where α m represents the precision of w m  In the procedure of finding α m *, some α m will become infinity which leads the corresponding w m to be zero  remain relevance vectors ! 38

Comparisons - Regression 39 RVM (on standard deviation predictive distribution) SVM

Comparisons - Regression 40

Comparison - Classification 41 RVM SVM

Comparison - Classification 42

Comparisons  RVM are much sparser and make probabilistic prediction  RVM gives better generalization in regression  SVM gives better generalization in classification  RVM is computationally demanding while learning 43

Outline IIntroduction to kernel methods SSupport vector machines (SVM) RRelevance vector machines (RVM) AApplications CConclusions 44

Applications (1)  SVM for face detection 45

Applications (2) 46 Marti Hearst, “ Support Vector Machines”,1998

Applications (3)  In the feature-matching based object tracking, SVM are used to detect false feature matches 47 Weiyu Zhu et al., “Tracking of Object with SVM Regression”, 2001

Applications (4)  Recovering 3D human poses by RVM 48 A. Agarwal and B. Triggs, “3D Human Pose from Silhouettes by Relevance Vector Regression” 2004

Outline IIntroduction to kernel methods SSupport vector machines (SVM) RRelevance vector machines (RVM) AApplications CConclusions 49

Conclusions  The SVM is a learning machine based on kernel method and generalization theory which can perform binary classification and real valued function approximation tasks  The RVM have the same model as SVM but provides probabilistic prediction and sparser solutions 50

References   N. Cristianini and J. Shawe-Taylor, “An Introduction to Support Vector Machines and Other Kernel-based Learning Methods,” Cambridge University Press,2000  M. E. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” Journal of Machine Learning Research,

Underfitting and Overfitting 52 underfitting-too simpleoverfitting-too complex Adapted from new data