Smooth ε -Insensitive Regression by Loss Symmetrization Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew.

Slides:



Advertisements
Similar presentations
Koby Crammer Department of Electrical Engineering
Advertisements

Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1.
Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )
On-line learning and Boosting
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem.
Boosting Approach to ML
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Separating Hyperplanes
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Forgetron Slide 1 Online Learning with a Memory Harness using the Forgetron Shai Shalev-Shwartz joint work with Ofer Dekel and Yoram Singer Large Scale.
Lecture: Dudu Yanay.  Input: Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’.  Goal: To find a rank-prediction.
The Rate of Convergence of AdaBoost Indraneel Mukherjee Cynthia Rudin Rob Schapire.
1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.
On-line Learning with Passive-Aggressive Algorithms Joseph Keshet The Hebrew University Learning Seminar,2004.
Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.
Minimaxity & Admissibility Presenting: Slava Chernoi Lehman and Casella, chapter 5 sections 1-2,7.
Reduced Support Vector Machine
The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:
Learning to Align Polyphonic Music. Slide 1 Learning to Align Polyphonic Music Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Optical Flow Estimation using Variational Techniques Darya Frolova.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Classification and Prediction: Regression Analysis
Online Learning Algorithms
Neural Networks Lecture 8: Two simple learning algorithms
Online Learning for Matrix Factorization and Sparse Coding
Crash Course on Machine Learning Part II
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Benk Erika Kelemen Zsolt
Online Learning for Collaborative Filtering
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Online Passive-Aggressive Algorithms Shai Shalev-Shwartz joint work with Koby Crammer, Ofer Dekel & Yoram Singer The Hebrew University Jerusalem, Israel.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Least Squares Support Vector Machine Classifiers J.A.K. Suykens and J. Vandewalle Presenter: Keira (Qi) Zhou.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Transductive Regression Piloted by Inter-Manifold Relations.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.
Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Boosting ---one of combining models Xin Li Machine Learning Course.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Quadratic Perceptron Learning with Applications
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
An Efficient Online Algorithm for Hierarchical Phoneme Classification
Multiplicative updates for L1-regularized regression
Trees, bagging, boosting, and stacking
CSE 4705 Artificial Intelligence
Classification with Perceptrons Reading:
Boosting Nearest-Neighbor Classifier for Character Recognition
CSCI B609: “Foundations of Data Science”
Online Learning Kernels
Presentation transcript:

Smooth ε -Insensitive Regression by Loss Symmetrization Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew University COLT 2003: The Sixteenth Annual Conference on Learning Theory

Before We Begin … Linear Regression: given find such that Least Squares: minimize Support Vector Regression: minimizes.t.

Loss Symmetrization Loss functions used in classification Boosting: Symmetric versions of these losses can be used for regression:

Begin with a regression training setBegin with a regression training set where, Generate 2m classification training examples of dimension n+1:Generate 2m classification training examples of dimension n+1: Learn while maintainingLearn while maintaining by minimizing a margin-based classification loss A General Reduction

An illustration of a single batch iteration Simplifying assumptions (just for the demo) –Instances are in –Set –Use the Symmetric Log-loss A Batch Algorithm

Calculate discrepancies and weights:

Cumulative weights: A Batch Algorithm

Update the regressor: Two Batch Algorithms or Additive update Log-Additive update

Theorem: (Log-Additive update) Theorem: (Additive update) Lemma: Both bounds are non-negative and equal zero only at the optimum Progress Bounds

A new form of regularization for regression and classification Boosting C Can be implemented by adding pseudo-examples * Communicated by Rob Schapire where Boosting Regularization

Regularization  Compactness of the feasible set forRegularization  Compactness of the feasible set for Regularization  A unique attainable optimizer of the loss functionRegularization  A unique attainable optimizer of the loss function Regularization Contd. Proof of Convergence  Progress + compactness + uniqueness = asymptotic convergence to the optimum

Two synthetic datasetsTwo synthetic datasets Exp-loss vs. Log-loss Log-loss Exp-loss

Extensions Parallel vs. Sequential updatesParallel vs. Sequential updates –Parallel - update all elements of in parallel –Sequential - update the weight of a single weak regressor on each round (like classic boosting) Another loss function – the “Combined Loss”Another loss function – the “Combined Loss” Log-lossExp-lossComb-loss

On-line Algorithms GD and EG online algorithms for Log-loss Relative loss bounds Future Directions Regression tree learning Solving one-class and various ranking problems using similar constructions Regression generalization bounds based on natural regularization