Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 15 Regression II 16.12.2013 Bastian Leibe RWTH Aachen.

Similar presentations


Presentation on theme: "Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 15 Regression II 16.12.2013 Bastian Leibe RWTH Aachen."— Presentation transcript:

1 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 15 Regression II 16.12.2013 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A A A A AAA A AA A AAAA A A A AA A A

2 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Announcements Lecture evaluation  Please fill out the forms... 2

3 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Course Outline Fundamentals (2 weeks)  Bayes Decision Theory  Probability Density Estimation Discriminative Approaches (5 weeks)  Linear Discriminant Functions  Statistical Learning Theory & SVMs  Ensemble Methods & Boosting  Decision Trees & Randomized Trees  Model Selection  Regression Problems Generative Models (4 weeks)  Bayesian Networks  Markov Random Fields B. Leibe 3

4 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Least-Squares Regression We have given  Training data points:  Associated function values: Start with linear regressor:  Try to enforce  One linear equation for each training data point / label pair.  Same basic setup as in least-squares classification!  Only the values are now continuous. Closed-form solution 4 B. Leibe Slide credit: Bernt Schiele

5 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Linear Basis Function Models Generally, we consider models of the following form  where Á j (x) are known as basis functions.  In the simplest case, we use linear basis functions: Á d (x) = x d. Other popular basis functions 5 B. Leibe PolynomialGaussianSigmoid

6 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Regularization Problem: Overfitting  Especially for small datasets and complex models  Observation: model coefficients get very large Workaround: Regularization  Penalize large coefficient values  This form with a quadratic regularizer is called Ridge Regression.  Closed-form solution 6 Effect of regularization: Keeps the inverse well-conditioned

7 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Regularized Least-Squares Consider more general regularization functions  “L q norms”: Effect: Sparsity for q  1.  Minimization tends to set many coefficients to zero 7 B. Leibe Image source: C.M. Bishop, 2006

8 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: The Lasso L 1 regularization (“The Lasso”) Interpretation as Bayes Estimation  We can think of | w j | q as the log-prior density for w j. Prior for Lasso ( q = 1) : Laplacian distribution 8 B. Leibe with Image source: Wikipedia

9 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Topics of This Lecture Kernel Methods for Regression  Dual representation  Kernel Ridge Regression Support Vector Regression  Recap: SVMs for classification  Error function  Primal form  Dual form Randomized Regression Forests  Regression trees  Optimization functions for regression 9 B. Leibe

10 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Kernel Methods for Regression Dual representations  Many linear models for regression and classification can be reformulated in terms of a dual representation, where predictions are based on linear combinations of a kernel function evaluated at training data points.  For models that are based on a fixed nonlinear feature space mapping Á ( x ), the kernel function is given by  We have seen that by substituting the inner product by the kernel, we can achieve interesting extensions of many well- known algorithms…  Let’s try this also for regression. 10 B. Leibe

11 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dual Representations: Derivation Consider a regularized linear regression model with the solution  We can write this as a linear combination of the Á ( x n ) with coefficients that are functions of w : with 11 B. Leibe

12 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dual Representations: Derivation Dual definition  Instead of working with w, we can formulate the optimization for a by substituting w = © T a into J ( w ) :  Define the kernel matrix K = ©© T with elements  Now, the sum-of-squares error can be written as 12 B. Leibe

13 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Kernel Ridge Regression  Solving for a, we obtain Prediction for a new input x :  Writing k ( x ) for the vector with elements  The dual formulation allows the solution to be entirely expressed in terms of the kernel function k ( x, x ’ ).  The resulting form is known as Kernel Ridge Regression and allows us to perform non-linear regression. 13 B. Leibe Image source: Christoph Lampert

14 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Topics of This Lecture Kernel Methods for Regression  Dual representation  Kernel Ridge Regression Support Vector Regression  Recap: SVMs for classification  Error function  Primal form  Dual form Randomized Regression Forests  Regression trees  Optimization functions for regression 14 B. Leibe

15 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: SVMs for Classification Traditional soft-margin formulation subject to the constraints Different way of looking at it  We can reformulate the constraints into the objective function. where [x] + := max { 0,x }. 15 B. Leibe “Hinge loss” L 2 regularizer “Most points should be on the correct side of the margin” “Maximize the margin” Slide adapted from Christoph Lampert

16 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 SVM – Discussion SVM optimization function Hinge loss enforces sparsity  Only a subset of training data points actually influences the decision boundary.  This is different from sparsity obtained through the regularizer! There, only a subset of input dimensions are used.  But we can now also apply different regularizers, e.g., –L 1 -SVMs –L 0 -SVMs 16 B. Leibe Hinge loss L 2 regularizer Slide adapted from Christoph Lampert

17 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 SVMs for Regression Linear regression  Minimize a regularized quadratic error function Problem  Sensitive to outliers, because the quadratic error function penalizes large residues.  This is the case even for (Kernel) Ridge Regression, although regularization helps. 17 B. Leibe Image source: C.M. Bishop, C. Lampert

18 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 SVMs for Regression Obtaining sparse solutions  Define an ² -insensitive error function  and minimize the following regularized function 18 B. Leibe Image source: C.M. Bishop

19 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dealing with Noise and Outliers Introduce slack variables  We now need two slack variables » n ¸ 0 and.  A target point lies in the ² -tube if y n - ² · t n · y n + ².  The corresponding conditions are 19 B. Leibe Image source: C.M. Bishop

20 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Dealing with Noise and Outliers Optimization with slack variables  The error function can then be rewritten as  Using the conditions for the slack variables, we obtain  And thus 20 B. Leibe

21 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Support Vector Regression – Primal Form Lagrangian primal form Solving for the variables 21 B. Leibe

22 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Support Vector Regression – Dual Form From this, we can derive the dual form  Maximize  under the conditions  Predictions for new inputs are then made using 22 B. Leibe

23 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 KKT Conditions KKT conditions Observations  A coefficient a n can only be non-zero if the first constraint is active, i.e., if a point lies either on or above the ² -tube.  Similarly, a non-zero coefficient must be on/below the ² -tube  The first two constraints cannot both be active at the same time  Either a n or or both must be zero.  The support vectors are those points for which or, i.e., the points on the boundary of or outside the ² -tube. 23 B. Leibe Image source: C.M. Bishop

24 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Discussion Slightly different interpretation  For SVMs, classification function depends only on SVs.  For SVR, support vectors mark outlier points. SVR tries to limit the effect of those outliers on the regression function.  Nevertheless, the prediction y(x) only depends on the support vectors. 24 B. Leibe Least-squares regressionSupport vector regression Image source: Christoph Lampert

25 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Example: Head Pose Estimation Procedure  Detect faces in image  Compute gradient representation of face region  Train support vector regression for yaw, tilt (separately) 25 B. Leibe Y. Li, S. Gong, J. Sherra, H. Liddell, Support vector machine based multi-view face detection and recognition, Image & Vision Computing, 2004.Support vector machine based multi-view face detection and recognition Slide credit: Christoph Lampert

26 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Topics of This Lecture Kernel Methods for Regression  Dual representation  Kernel Ridge Regression Support Vector Regression  Recap: SVMs for classification  Error function  Primal form  Dual form Randomized Regression Forests  Regression trees  Optimization functions for regression 26 B. Leibe

27 Regression forest Training data ? Output/labels are continuous Leaf model Model specialization fore regression Input data point Obj. funct. for node j Training node j Node weak learner Information gain defined on continuous variables.

28 Regression forest: Regression forest: the node weak learner model Examples of node weak learners See Appendix C for relation with kernel trick. Node weak learner Node test params Splitting data at node j Weak learner: axis alignedWeak learner: oriented line Weak learner: conic section Feature response for 2D example. With a generic line in homog. coordinates. Feature response for 2D example. With a matrix representing a conic. Feature response for 2D example. In general may select only a very small subset of features Withor

29 Regression forest: Regression forest: the predictor model Predictor model: constant Examples of leaf (predictor) models Predictor model: polynomialPredictor model: probabilistic-linear (note: linear for n=1, constant for n=0) What do we do at the leaf?

30 Computing the regression information gain at node j Regression forest: Regression forest: objective function Regression information gain Our regression information gain Differential entropy of Gaussian At node j

31 Regression forest: Regression forest: objective function Our information gain Error of fit Comparison with Breiman’s error of fit (CART) The error of fit objective is a special instance of our more general information-theoretical objective function. At node j

32 Tree t=1 t=2 t=3 Forest output probability Regression forest: the ensemble model The ensemble model

33 Regression forest: probabilistic, non-linear regression Training different trees in the forest (max tree depth D=2) Testing different trees in the forest Generalization properties: -Smooth interpolating behaviour in gaps between training data. -Uncertainty increases with distance from training data. See Appendix A for probabilistic line fitting. Parameters: T=400, D=2, weak learner = aligned, leaf model = prob. line Tree posteriorsForest posterior Training points (4 videos in this page)

34 Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 References and Further Reading More information on Support Vector Regression can be found in Chapter 7.1.4 of Bishop’s book. You can also look at Schölkopf & Smola (some chapters available online). B. Leibe 34 Christopher M. Bishop Pattern Recognition and Machine Learning Springer, 2006 B. Schölkopf, A. Smola Learning with Kernels MIT Press, 2002 http://www.learning-with-kernels.org/ http://www.learning-with-kernels.org/


Download ppt "Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 15 Regression II 16.12.2013 Bastian Leibe RWTH Aachen."

Similar presentations


Ads by Google