Download presentation
Presentation is loading. Please wait.
Published byKevin Walsh Modified over 9 years ago
1
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 15 Regression II 16.12.2013 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A A A A AAA A AA A AAAA A A A AA A A
2
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Announcements Lecture evaluation Please fill out the forms... 2
3
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Course Outline Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Discriminative Approaches (5 weeks) Linear Discriminant Functions Statistical Learning Theory & SVMs Ensemble Methods & Boosting Decision Trees & Randomized Trees Model Selection Regression Problems Generative Models (4 weeks) Bayesian Networks Markov Random Fields B. Leibe 3
4
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Least-Squares Regression We have given Training data points: Associated function values: Start with linear regressor: Try to enforce One linear equation for each training data point / label pair. Same basic setup as in least-squares classification! Only the values are now continuous. Closed-form solution 4 B. Leibe Slide credit: Bernt Schiele
5
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Linear Basis Function Models Generally, we consider models of the following form where Á j (x) are known as basis functions. In the simplest case, we use linear basis functions: Á d (x) = x d. Other popular basis functions 5 B. Leibe PolynomialGaussianSigmoid
6
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Regularization Problem: Overfitting Especially for small datasets and complex models Observation: model coefficients get very large Workaround: Regularization Penalize large coefficient values This form with a quadratic regularizer is called Ridge Regression. Closed-form solution 6 Effect of regularization: Keeps the inverse well-conditioned
7
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Regularized Least-Squares Consider more general regularization functions “L q norms”: Effect: Sparsity for q 1. Minimization tends to set many coefficients to zero 7 B. Leibe Image source: C.M. Bishop, 2006
8
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: The Lasso L 1 regularization (“The Lasso”) Interpretation as Bayes Estimation We can think of | w j | q as the log-prior density for w j. Prior for Lasso ( q = 1) : Laplacian distribution 8 B. Leibe with Image source: Wikipedia
9
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Topics of This Lecture Kernel Methods for Regression Dual representation Kernel Ridge Regression Support Vector Regression Recap: SVMs for classification Error function Primal form Dual form Randomized Regression Forests Regression trees Optimization functions for regression 9 B. Leibe
10
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Kernel Methods for Regression Dual representations Many linear models for regression and classification can be reformulated in terms of a dual representation, where predictions are based on linear combinations of a kernel function evaluated at training data points. For models that are based on a fixed nonlinear feature space mapping Á ( x ), the kernel function is given by We have seen that by substituting the inner product by the kernel, we can achieve interesting extensions of many well- known algorithms… Let’s try this also for regression. 10 B. Leibe
11
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dual Representations: Derivation Consider a regularized linear regression model with the solution We can write this as a linear combination of the Á ( x n ) with coefficients that are functions of w : with 11 B. Leibe
12
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dual Representations: Derivation Dual definition Instead of working with w, we can formulate the optimization for a by substituting w = © T a into J ( w ) : Define the kernel matrix K = ©© T with elements Now, the sum-of-squares error can be written as 12 B. Leibe
13
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Kernel Ridge Regression Solving for a, we obtain Prediction for a new input x : Writing k ( x ) for the vector with elements The dual formulation allows the solution to be entirely expressed in terms of the kernel function k ( x, x ’ ). The resulting form is known as Kernel Ridge Regression and allows us to perform non-linear regression. 13 B. Leibe Image source: Christoph Lampert
14
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Topics of This Lecture Kernel Methods for Regression Dual representation Kernel Ridge Regression Support Vector Regression Recap: SVMs for classification Error function Primal form Dual form Randomized Regression Forests Regression trees Optimization functions for regression 14 B. Leibe
15
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: SVMs for Classification Traditional soft-margin formulation subject to the constraints Different way of looking at it We can reformulate the constraints into the objective function. where [x] + := max { 0,x }. 15 B. Leibe “Hinge loss” L 2 regularizer “Most points should be on the correct side of the margin” “Maximize the margin” Slide adapted from Christoph Lampert
16
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 SVM – Discussion SVM optimization function Hinge loss enforces sparsity Only a subset of training data points actually influences the decision boundary. This is different from sparsity obtained through the regularizer! There, only a subset of input dimensions are used. But we can now also apply different regularizers, e.g., –L 1 -SVMs –L 0 -SVMs 16 B. Leibe Hinge loss L 2 regularizer Slide adapted from Christoph Lampert
17
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 SVMs for Regression Linear regression Minimize a regularized quadratic error function Problem Sensitive to outliers, because the quadratic error function penalizes large residues. This is the case even for (Kernel) Ridge Regression, although regularization helps. 17 B. Leibe Image source: C.M. Bishop, C. Lampert
18
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 SVMs for Regression Obtaining sparse solutions Define an ² -insensitive error function and minimize the following regularized function 18 B. Leibe Image source: C.M. Bishop
19
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dealing with Noise and Outliers Introduce slack variables We now need two slack variables » n ¸ 0 and. A target point lies in the ² -tube if y n - ² · t n · y n + ². The corresponding conditions are 19 B. Leibe Image source: C.M. Bishop
20
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dealing with Noise and Outliers Optimization with slack variables The error function can then be rewritten as Using the conditions for the slack variables, we obtain And thus 20 B. Leibe
21
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Support Vector Regression – Primal Form Lagrangian primal form Solving for the variables 21 B. Leibe
22
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Support Vector Regression – Dual Form From this, we can derive the dual form Maximize under the conditions Predictions for new inputs are then made using 22 B. Leibe
23
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 KKT Conditions KKT conditions Observations A coefficient a n can only be non-zero if the first constraint is active, i.e., if a point lies either on or above the ² -tube. Similarly, a non-zero coefficient must be on/below the ² -tube The first two constraints cannot both be active at the same time Either a n or or both must be zero. The support vectors are those points for which or, i.e., the points on the boundary of or outside the ² -tube. 23 B. Leibe Image source: C.M. Bishop
24
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Discussion Slightly different interpretation For SVMs, classification function depends only on SVs. For SVR, support vectors mark outlier points. SVR tries to limit the effect of those outliers on the regression function. Nevertheless, the prediction y(x) only depends on the support vectors. 24 B. Leibe Least-squares regressionSupport vector regression Image source: Christoph Lampert
25
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Example: Head Pose Estimation Procedure Detect faces in image Compute gradient representation of face region Train support vector regression for yaw, tilt (separately) 25 B. Leibe Y. Li, S. Gong, J. Sherra, H. Liddell, Support vector machine based multi-view face detection and recognition, Image & Vision Computing, 2004.Support vector machine based multi-view face detection and recognition Slide credit: Christoph Lampert
26
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Topics of This Lecture Kernel Methods for Regression Dual representation Kernel Ridge Regression Support Vector Regression Recap: SVMs for classification Error function Primal form Dual form Randomized Regression Forests Regression trees Optimization functions for regression 26 B. Leibe
27
Regression forest Training data ? Output/labels are continuous Leaf model Model specialization fore regression Input data point Obj. funct. for node j Training node j Node weak learner Information gain defined on continuous variables.
28
Regression forest: Regression forest: the node weak learner model Examples of node weak learners See Appendix C for relation with kernel trick. Node weak learner Node test params Splitting data at node j Weak learner: axis alignedWeak learner: oriented line Weak learner: conic section Feature response for 2D example. With a generic line in homog. coordinates. Feature response for 2D example. With a matrix representing a conic. Feature response for 2D example. In general may select only a very small subset of features Withor
29
Regression forest: Regression forest: the predictor model Predictor model: constant Examples of leaf (predictor) models Predictor model: polynomialPredictor model: probabilistic-linear (note: linear for n=1, constant for n=0) What do we do at the leaf?
30
Computing the regression information gain at node j Regression forest: Regression forest: objective function Regression information gain Our regression information gain Differential entropy of Gaussian At node j
31
Regression forest: Regression forest: objective function Our information gain Error of fit Comparison with Breiman’s error of fit (CART) The error of fit objective is a special instance of our more general information-theoretical objective function. At node j
32
Tree t=1 t=2 t=3 Forest output probability Regression forest: the ensemble model The ensemble model
33
Regression forest: probabilistic, non-linear regression Training different trees in the forest (max tree depth D=2) Testing different trees in the forest Generalization properties: -Smooth interpolating behaviour in gaps between training data. -Uncertainty increases with distance from training data. See Appendix A for probabilistic line fitting. Parameters: T=400, D=2, weak learner = aligned, leaf model = prob. line Tree posteriorsForest posterior Training points (4 videos in this page)
34
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 References and Further Reading More information on Support Vector Regression can be found in Chapter 7.1.4 of Bishop’s book. You can also look at Schölkopf & Smola (some chapters available online). B. Leibe 34 Christopher M. Bishop Pattern Recognition and Machine Learning Springer, 2006 B. Schölkopf, A. Smola Learning with Kernels MIT Press, 2002 http://www.learning-with-kernels.org/ http://www.learning-with-kernels.org/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.