Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 15 Regression II 16.12.2013 Bastian Leibe RWTH Aachen.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
ECG Signal processing (2)
Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Pattern Recognition and Machine Learning: Kernel Methods.
Support vector machine
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CMPUT 466/551 Principal Source: CMU
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
Pattern Recognition and Machine Learning
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Support Vector Machines Kernel Machines
Support Vector Machine (SVM) Classification
Data mining and statistical learning - lecture 13 Separating hyperplane.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
PATTERN RECOGNITION AND MACHINE LEARNING
Outline Separating Hyperplanes – Separable Case
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 9 Nonlinear SVMs and Extensions Bastian Leibe.
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 14 Introduction to Regression Bastian Leibe.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Machine Learning – Lecture 6
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Perceptual and Sensory Augmented Computing Machine Learning WS 13/14 Machine Learning – Lecture 3 Probability Density Estimation II Bastian.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Perceptual and Sensory Augmented Computing Machine Learning, Summer’12 Machine Learning – Lecture 7 Nonlinear SVMs Bastian Leibe RWTH Aachen.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Biointelligence Laboratory, Seoul National University
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 7 Statistical Learning Theory & SVMs Bastian.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Machine Learning – Lecture 11
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Perceptual and Sensory Augmented Computing Machine Learning, Summer’12 Machine Learning – Lecture 6 Statistical Learning Theory & SVMs Bastian.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 5 Linear Discriminant Functions Bastian Leibe.
Perceptual and Sensory Augmented Computing Machine Learning, Summer’09 Machine Learning – Lecture 7 Model Combination & Boosting Bastian Leibe.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
PREDICT 422: Practical Machine Learning
Sparse Kernel Machines
Boosting and Additive Trees (2)
Biointelligence Laboratory, Seoul National University
Presentation transcript:

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 15 Regression II Bastian Leibe RWTH Aachen TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A A A A AAA A AA A AAAA A A A AA A A

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Announcements Lecture evaluation  Please fill out the forms... 2

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Course Outline Fundamentals (2 weeks)  Bayes Decision Theory  Probability Density Estimation Discriminative Approaches (5 weeks)  Linear Discriminant Functions  Statistical Learning Theory & SVMs  Ensemble Methods & Boosting  Decision Trees & Randomized Trees  Model Selection  Regression Problems Generative Models (4 weeks)  Bayesian Networks  Markov Random Fields B. Leibe 3

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Least-Squares Regression We have given  Training data points:  Associated function values: Start with linear regressor:  Try to enforce  One linear equation for each training data point / label pair.  Same basic setup as in least-squares classification!  Only the values are now continuous. Closed-form solution 4 B. Leibe Slide credit: Bernt Schiele

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Linear Basis Function Models Generally, we consider models of the following form  where Á j (x) are known as basis functions.  In the simplest case, we use linear basis functions: Á d (x) = x d. Other popular basis functions 5 B. Leibe PolynomialGaussianSigmoid

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Regularization Problem: Overfitting  Especially for small datasets and complex models  Observation: model coefficients get very large Workaround: Regularization  Penalize large coefficient values  This form with a quadratic regularizer is called Ridge Regression.  Closed-form solution 6 Effect of regularization: Keeps the inverse well-conditioned

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Regularized Least-Squares Consider more general regularization functions  “L q norms”: Effect: Sparsity for q  1.  Minimization tends to set many coefficients to zero 7 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: The Lasso L 1 regularization (“The Lasso”) Interpretation as Bayes Estimation  We can think of | w j | q as the log-prior density for w j. Prior for Lasso ( q = 1) : Laplacian distribution 8 B. Leibe with Image source: Wikipedia

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Topics of This Lecture Kernel Methods for Regression  Dual representation  Kernel Ridge Regression Support Vector Regression  Recap: SVMs for classification  Error function  Primal form  Dual form Randomized Regression Forests  Regression trees  Optimization functions for regression 9 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Kernel Methods for Regression Dual representations  Many linear models for regression and classification can be reformulated in terms of a dual representation, where predictions are based on linear combinations of a kernel function evaluated at training data points.  For models that are based on a fixed nonlinear feature space mapping Á ( x ), the kernel function is given by  We have seen that by substituting the inner product by the kernel, we can achieve interesting extensions of many well- known algorithms…  Let’s try this also for regression. 10 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dual Representations: Derivation Consider a regularized linear regression model with the solution  We can write this as a linear combination of the Á ( x n ) with coefficients that are functions of w : with 11 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dual Representations: Derivation Dual definition  Instead of working with w, we can formulate the optimization for a by substituting w = © T a into J ( w ) :  Define the kernel matrix K = ©© T with elements  Now, the sum-of-squares error can be written as 12 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Kernel Ridge Regression  Solving for a, we obtain Prediction for a new input x :  Writing k ( x ) for the vector with elements  The dual formulation allows the solution to be entirely expressed in terms of the kernel function k ( x, x ’ ).  The resulting form is known as Kernel Ridge Regression and allows us to perform non-linear regression. 13 B. Leibe Image source: Christoph Lampert

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Topics of This Lecture Kernel Methods for Regression  Dual representation  Kernel Ridge Regression Support Vector Regression  Recap: SVMs for classification  Error function  Primal form  Dual form Randomized Regression Forests  Regression trees  Optimization functions for regression 14 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: SVMs for Classification Traditional soft-margin formulation subject to the constraints Different way of looking at it  We can reformulate the constraints into the objective function. where [x] + := max { 0,x }. 15 B. Leibe “Hinge loss” L 2 regularizer “Most points should be on the correct side of the margin” “Maximize the margin” Slide adapted from Christoph Lampert

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 SVM – Discussion SVM optimization function Hinge loss enforces sparsity  Only a subset of training data points actually influences the decision boundary.  This is different from sparsity obtained through the regularizer! There, only a subset of input dimensions are used.  But we can now also apply different regularizers, e.g., –L 1 -SVMs –L 0 -SVMs 16 B. Leibe Hinge loss L 2 regularizer Slide adapted from Christoph Lampert

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 SVMs for Regression Linear regression  Minimize a regularized quadratic error function Problem  Sensitive to outliers, because the quadratic error function penalizes large residues.  This is the case even for (Kernel) Ridge Regression, although regularization helps. 17 B. Leibe Image source: C.M. Bishop, C. Lampert

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 SVMs for Regression Obtaining sparse solutions  Define an ² -insensitive error function  and minimize the following regularized function 18 B. Leibe Image source: C.M. Bishop

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dealing with Noise and Outliers Introduce slack variables  We now need two slack variables » n ¸ 0 and.  A target point lies in the ² -tube if y n - ² · t n · y n + ².  The corresponding conditions are 19 B. Leibe Image source: C.M. Bishop

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dealing with Noise and Outliers Optimization with slack variables  The error function can then be rewritten as  Using the conditions for the slack variables, we obtain  And thus 20 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Support Vector Regression – Primal Form Lagrangian primal form Solving for the variables 21 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Support Vector Regression – Dual Form From this, we can derive the dual form  Maximize  under the conditions  Predictions for new inputs are then made using 22 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 KKT Conditions KKT conditions Observations  A coefficient a n can only be non-zero if the first constraint is active, i.e., if a point lies either on or above the ² -tube.  Similarly, a non-zero coefficient must be on/below the ² -tube  The first two constraints cannot both be active at the same time  Either a n or or both must be zero.  The support vectors are those points for which or, i.e., the points on the boundary of or outside the ² -tube. 23 B. Leibe Image source: C.M. Bishop

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Discussion Slightly different interpretation  For SVMs, classification function depends only on SVs.  For SVR, support vectors mark outlier points. SVR tries to limit the effect of those outliers on the regression function.  Nevertheless, the prediction y(x) only depends on the support vectors. 24 B. Leibe Least-squares regressionSupport vector regression Image source: Christoph Lampert

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Example: Head Pose Estimation Procedure  Detect faces in image  Compute gradient representation of face region  Train support vector regression for yaw, tilt (separately) 25 B. Leibe Y. Li, S. Gong, J. Sherra, H. Liddell, Support vector machine based multi-view face detection and recognition, Image & Vision Computing, 2004.Support vector machine based multi-view face detection and recognition Slide credit: Christoph Lampert

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Topics of This Lecture Kernel Methods for Regression  Dual representation  Kernel Ridge Regression Support Vector Regression  Recap: SVMs for classification  Error function  Primal form  Dual form Randomized Regression Forests  Regression trees  Optimization functions for regression 26 B. Leibe

Regression forest Training data ? Output/labels are continuous Leaf model Model specialization fore regression Input data point Obj. funct. for node j Training node j Node weak learner Information gain defined on continuous variables.

Regression forest: Regression forest: the node weak learner model Examples of node weak learners See Appendix C for relation with kernel trick. Node weak learner Node test params Splitting data at node j Weak learner: axis alignedWeak learner: oriented line Weak learner: conic section Feature response for 2D example. With a generic line in homog. coordinates. Feature response for 2D example. With a matrix representing a conic. Feature response for 2D example. In general may select only a very small subset of features Withor

Regression forest: Regression forest: the predictor model Predictor model: constant Examples of leaf (predictor) models Predictor model: polynomialPredictor model: probabilistic-linear (note: linear for n=1, constant for n=0) What do we do at the leaf?

Computing the regression information gain at node j Regression forest: Regression forest: objective function Regression information gain Our regression information gain Differential entropy of Gaussian At node j

Regression forest: Regression forest: objective function Our information gain Error of fit Comparison with Breiman’s error of fit (CART) The error of fit objective is a special instance of our more general information-theoretical objective function. At node j

Tree t=1 t=2 t=3 Forest output probability Regression forest: the ensemble model The ensemble model

Regression forest: probabilistic, non-linear regression Training different trees in the forest (max tree depth D=2) Testing different trees in the forest Generalization properties: -Smooth interpolating behaviour in gaps between training data. -Uncertainty increases with distance from training data. See Appendix A for probabilistic line fitting. Parameters: T=400, D=2, weak learner = aligned, leaf model = prob. line Tree posteriorsForest posterior Training points (4 videos in this page)

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 References and Further Reading More information on Support Vector Regression can be found in Chapter of Bishop’s book. You can also look at Schölkopf & Smola (some chapters available online). B. Leibe 34 Christopher M. Bishop Pattern Recognition and Machine Learning Springer, 2006 B. Schölkopf, A. Smola Learning with Kernels MIT Press,