Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.

Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
1-norm Support Vector Machines Good for Feature Selection  Solve the quadratic program for some : min s. t.,, denotes where or membership. Equivalent.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machine (SVM) Classification
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
Support Vector Machines
Sparse Kernels Methods Steve Gunn.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Support Vector Regression (Linear Case:)  Given the training set:  Find a linear function, where is determined by solving a minimization problem that.
Lecture 10: Support Vector Machines
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Classification and Regression
Mathematical Programming in Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Biointelligence Laboratory, Seoul National University
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 New Horizon in Machine Learning — Support Vector Machine for non-Parametric Learning Zhao Lu, Ph.D. Associate Professor Department of Electrical Engineering,
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Nonlinear Adaptive Kernel Methods Dec. 1, 2009 Anthony Kuh Chaopin Zhu Nate Kowahl.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
An Introduction of Support Vector Machine Courtesy of Jinwei Gu.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Support Vector Machines
An Introduction to Support Vector Machines
An Introduction to Support Vector Machines
Identification of Wiener models using support vector regression
Support Vector Machines
Support Vector Machines
Presentation transcript:

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03

Outline  Regression Background  Linear ε- Insensitive Loss Algorithm Primal Formulation Dual Formulation Kernel Formulation  Quadratic ε- Insensitive Loss Algorithm  Kernel Ridge Regression & Gaussian Process

Regression = find a function that fits the observations Observations: (1949,100) (1950,117)... (1996,1462) (1997,1469) (1998,1467) (1999,1474) (x,y) pairs

Linear fit... Not so good...

Better linear fit... Take logarithm of y and fit a straight line

Transform back to original So so...

So what is regression about? Construct a model of a process, using examples of the process. Input: x (possibly a vector) Output: f(x) (generated by the process) Examples: Pairs of input and output {y, x} Our model: The function is our estimate of the true function g(x)

Assumption about the process The “fixed regressor model” x(n) Observed input y(n) Observed output g[x(n)] True underlying function  (n) I.I.D noise process with zero mean Data set:

Example  2

Model Sets (examples) g(x) = x + x 2 + 6x 3 11 22 33  1 ={a+bx};  2 ={a+bx+cx 2 };  3 ={a+bx+cx 2 +dx 3 }; Linear; Quadratic; Cubic; 1  2  31  2  3

Idealized regression  g(x) Model Set (our hypothesis set) f opt (x)   Error Find appropriate model family  and find f(x)   with minimum “distance” to g(x) (“error”)

How measure “distance”? Q: What is the distance (difference) between functions f and g?

Margin Slack Variable For Example(xi, yi), function f, Margin slack variable θ: target accuracy in test γ : difference between target accuracy and margin in training

ε- Insensitive Loss Function  Let ε= θ-γ, Margin Slack Variable  Linear ε- Insensitive Loss:  Quadratic ε- Insensitive Loss

Linear ε- Insensitive Loss  a Linear SV Machine ξ ξ Yi-

Basic Idea of SV Regression  Starting point We have input data X = {(x 1,y 1 ), …., (x N,y N )}  Goal We want to find a robust function f(x) that has at most ε deviation from the targets y, while at the same time being as flat as possible.  Idea Simple Regression Problem + Optimization + Kernel Trick

 Thus setting:  Primal Regression Problem

Linear ε- Insensitive Loss Regression min subject to ε  decide Insensitive Zone C  a trade-off between error and ||w||  εand C must be tuned simultaneously Regression is more difficult than Classification?

Parameters used in SV Regression

Dual Formulation Lagrangian function will help us to formulate the dual problem ε: insensitive loss β i * : Lagrange Multiplier ξ i : difference value for points above εband ξ i * : difference value for points below εband Optimality Conditions

Dual Formulation(Cont’) Dual Problem Solving

KKT Optimality Conditions and b  KKT Optimality Conditions  b can be computed as follows This means that the Lagrange multipliers will only be non-zero for points outside the  band. Thus these points are the support vectors  

The Idea of SVM input space feature space     

Kernel Version  Why can we use Kernel? The complexity of a function’s representation depends only on the number of SVs  the complete algorithm can be described in terms of inner product. An implicit mapping to the feature space  Mapping via Kernel

Quadratic ε- Insensitive Loss Regression Problem: min subject to Kernel Formulation

Kernel Ridge Regression & Gaussian Processes  ε= 0  Least Square Linear Regression The weight decay factor is controlled by C  min (λ~1/C) subject to  Kernel Formulation (I: Identity Matrix) is also the mean of a Gaussian distribution

Architecture of SV Regression Machine similar to regression in a three-layered neural network!? b

Conclusion  SVM is a useful alternative to neural network  Two key concepts of SVM optimization kernel trick  Advantages of SV Regression Represent solution by a small subset of training points Ensure the existence of global minimum Ensure the optimization of a reliable eneralization bound

Discussion1: Influence of an insensitivity band on regression quality  17 measured training data points are used.  Left: ε= 0.1  15 SV are chosen  Right: ε= 0.5  6 chosen SV produced a much better regression function

 Enables sparseness within SVs, but guarantees sparseness?  Robust (robust to small changes in data/ model)  Less sensitive to outliers Discussion2: ε- Insensitive Loss