Ensemble Learning (1) Boosting Adaboost Boosting is an additive model

Slides:



Advertisements
Similar presentations
Ensemble Learning – Bagging, Boosting, and Stacking, and other topics
Advertisements

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Boosting Ashok Veeraraghavan. Boosting Methods Combine many weak classifiers to produce a committee. Resembles Bagging and other committee based methods.
Games of Prediction or Things get simpler as Yoav Freund Banter Inc.
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Boosting CMPUT 615 Boosting Idea We have a weak classifier, i.e., it’s error rate is a little bit better than 0.5. Boosting combines a lot of such weak.
Sparse vs. Ensemble Approaches to Supervised Learning
Statistics for Managers Using Microsoft® Excel 5th Edition
Ensemble Learning what is an ensemble? why use an ensemble?
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Adaboost and its application
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Sparse vs. Ensemble Approaches to Supervised Learning
Data mining and statistical learning - lecture 12 Neural networks (NN) and Multivariate Adaptive Regression Splines (MARS)  Different types of neural.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Correlation & Regression
Neural Networks Lecture 8: Two simple learning algorithms
Machine Learning CS 165B Spring 2012
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CS 391L: Machine Learning: Ensembles
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
CN700: HST Neil Weisenfeld (notes were recycled and modified from Prof. Cohen and an unnamed student) April 12, 2005.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods in Machine Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Classification Ensemble Methods 1
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.
Boosting ---one of combining models Xin Li Machine Learning Course.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Ch 14. Combining Models Pattern Recognition and Machine Learning, C. M
Boosting and Additive Trees (2)
Boosting and Additive Trees
A Simple Artificial Neuron
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
ECE 5424: Introduction to Machine Learning
Asymmetric Gradient Boosting with Application to Spam Filtering
The
Lecture 18: Bagging and Boosting
Support Vector Machine _ 2 (SVM)
Lecture 06: Bagging and Boosting
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.
Presentation transcript:

Ensemble Learning (1) Boosting Adaboost Boosting is an additive model Lecture 6 Ensemble Learning (1) Boosting Adaboost Boosting is an additive model Brief intro to lasso The relationship

Boosting Combine multiple classifiers. Construct a sequence of weak classifiers, and combine them into a strong classifier by a weighted majority vote. “weak”: better than random coin-tossing Some properties: Flexible. Able to select features. Good generalization. Could fit noise.

Boosting Adaboost: (Freund &Schapire 1995)

Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire

Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire

Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire

Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire

Boosting

Boosting This is the weight of the current weak classifier in the final model. This weight is for individual observations. Notice it is stacked from step 1. If an observation is correctly classified at this step, its weight doesn’t change. If incorrectly classified, its weight increases.

Boosting

Boosting

Boosting

Boosting 10 predictors The weak classifier is a Stump: a two-level tree.

Boosting Boosting can be seen as fitting an additive model, with the general form: Expansion coefficients Basis functions: Simple functions of feature x, with parameters γ Examples of γ: Sigmoidal function in neural networks; A split in a tree model;

Boosting In general, such functions are fit by minimizing a loss function This could be computationally intensive. An alternative is to go stepwise, fitting a sub-problem of a single basis function

Boosting Forward stagewise additive modeling --- add new basis functions without adjusting previously added ones. Example: * Squared loss function is not good for classification.

Boosting The version of Adaboost we discussed uses this loss function: The basis functions are individual weak classifiers.

Boosting Margin: y*f(x) >0, correct <0, incorrect The goal of classification – to produce positive margin as much as possible. Negative margin should be penalized more. Exponential penalize negative margin more heavily.

Boosting To be solved: Independent from β and G

Boosting Observations are either correctly or incorrectly classified. Then the target function to be minimized is: For any β> 0, Gm has to satisfy: G is the classifier that minimizes the weighted error rate.

Boosting Solving for the Gm will give us a weighted error rate. Plug it back to get β: Update the overall classifier by plugging these in:

Boosting The weight for next iteration becomes: Using Independent of i. Ignored.

Lasso The equivalent Lagrangian form: Ridge regression: Elastic Net:

Lasso

Lasso Orthogonal x are the least squares estimates

Lasso Lasso Ridge Error contour in parameter space.

Boosted linear regression {Tk} : a collection of basis functions

Boosted linear regression Here the T’s are X’s themselves in a linear regression setting.