Download presentation
Presentation is loading. Please wait.
Published bySydney Simpson Modified over 8 years ago
1
Smooth ε -Insensitive Regression by Loss Symmetrization Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew University {oferd,shais,singer}@cs.huji.ac.il COLT 2003: The Sixteenth Annual Conference on Learning Theory
2
Before We Begin … Linear Regression: given find such that Least Squares: minimize Support Vector Regression: minimizes.t.
3
Loss Symmetrization Loss functions used in classification Boosting: Symmetric versions of these losses can be used for regression:
4
Begin with a regression training setBegin with a regression training set where, Generate 2m classification training examples of dimension n+1:Generate 2m classification training examples of dimension n+1: Learn while maintainingLearn while maintaining by minimizing a margin-based classification loss A General Reduction
5
An illustration of a single batch iteration Simplifying assumptions (just for the demo) –Instances are in –Set –Use the Symmetric Log-loss A Batch Algorithm
6
Calculate discrepancies and weights: 0 1 2 3 4 4321043210
7
Cumulative weights: 0 1 2 3 4 A Batch Algorithm
8
Update the regressor: 0 1 2 3 4 4321043210 Two Batch Algorithms or Additive update Log-Additive update
9
Theorem: (Log-Additive update) Theorem: (Additive update) Lemma: Both bounds are non-negative and equal zero only at the optimum Progress Bounds
10
A new form of regularization for regression and classification Boosting C Can be implemented by adding pseudo-examples * Communicated by Rob Schapire where Boosting Regularization
11
Regularization Compactness of the feasible set forRegularization Compactness of the feasible set for Regularization A unique attainable optimizer of the loss functionRegularization A unique attainable optimizer of the loss function Regularization Contd. Proof of Convergence Progress + compactness + uniqueness = asymptotic convergence to the optimum
12
Two synthetic datasetsTwo synthetic datasets Exp-loss vs. Log-loss Log-loss Exp-loss
13
Extensions Parallel vs. Sequential updatesParallel vs. Sequential updates –Parallel - update all elements of in parallel –Sequential - update the weight of a single weak regressor on each round (like classic boosting) Another loss function – the “Combined Loss”Another loss function – the “Combined Loss” Log-lossExp-lossComb-loss
14
On-line Algorithms GD and EG online algorithms for Log-loss Relative loss bounds Future Directions Regression tree learning Solving one-class and various ranking problems using similar constructions Regression generalization bounds based on natural regularization
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.