Boosting and Additive Trees

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

On-line learning and Boosting
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Pattern Recognition and Machine Learning
Boosting Ashok Veeraraghavan. Boosting Methods Combine many weak classifiers to produce a committee. Resembles Bagging and other committee based methods.
Games of Prediction or Things get simpler as Yoav Freund Banter Inc.
Model Assessment, Selection and Averaging
CMPUT 466/551 Principal Source: CMU
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Sparse vs. Ensemble Approaches to Supervised Learning
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Sparse vs. Ensemble Approaches to Supervised Learning
AdaBoost Robert E. Schapire (Princeton University) Yoav Freund (University of California at San Diego) Presented by Zhi-Hua Zhou (Nanjing University)
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Benk Erika Kelemen Zsolt
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
CN700: HST Neil Weisenfeld (notes were recycled and modified from Prof. Cohen and an unnamed student) April 12, 2005.
INTRODUCTION TO Machine Learning 3rd Edition
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Ensemble Methods in Machine Learning
Classification Ensemble Methods 1
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.
Boosting ---one of combining models Xin Li Machine Learning Course.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CPH Dr. Charnigo Chap. 9 Notes To begin with, have a look at Figure 9.5 on page 315. One can get an intuitive feel for how a tree works by examining.
1 C.A.L. Bailer-Jones. Machine Learning. Model selection and combination Machine learning, pattern recognition and statistical data modelling Lecture 10.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Cost-Sensitive Boosting algorithms: Do we really need them?
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Chapter 7. Classification and Prediction
Bagging and Random Forests
Week 2 Presentation: Project 3
Deep Feedforward Networks
Ch 14. Combining Models Pattern Recognition and Machine Learning, C. M
Boosting and Additive Trees (2)
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
COMP61011 : Machine Learning Ensemble Models
Overview of Supervised Learning
Bias and Variance of the Estimator
Asymmetric Gradient Boosting with Application to Spam Filtering
Statistical Learning Dong Liu Dept. EEIS, USTC.
Adaboost Team G Youngmin Jun
Data Mining Practical Machine Learning Tools and Techniques
Hidden Markov Models Part 2: Algorithms
Collaborative Filtering Matrix Factorization Approach
Introduction to Boosting
Lecture 18: Bagging and Boosting
Pattern Recognition and Machine Learning
Simple Linear Regression
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Support Vector Machine _ 2 (SVM)
Lecture 06: Bagging and Boosting
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
Neural networks (1) Traditional multi-layer perceptrons
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Boosting and Additive Trees Chapter 10 Boosting and Additive Trees

Boosting Methods The motivation for boosting was a procedure that combines the outputs of many “weak” classifiers to produce a powerful “committee.”

Boosting Methods We begin by describing the most popular boosting algorithm due to Freund and Schapire (1997) called “AdaBoost.M1.”

Boosting Methods Yoav Freund is an israeli professor of computer science at the University of California, San Diego who mainly works on machine learning, probability theory and related fields and applications. Robert Elias Schapire is the David M. Siegel '83 Professor in the computer science department at Princeton University. His primary specialty is theoretical and applied machine learning.

Boosting Methods Consider a two-class problem, with the output variable coded as Y {−1, 1}. Given a vector of predictor variables X, a classifier G(X) produces a prediction taking one of the two values {−1, 1}. The error rate on the training sample is

Boosting Methods The expected error rate on future predictions is

Boosting Methods

Boosting Methods The predictions from all of them are then combined through a weighted majority vote to produce the final prediction:

Boosting Methods The AdaBoost.M1 algorithm is known as “Discrete AdaBoost” in Friedman et al. (2000), because the base classifier returns a discrete class label. If the base classifier instead returns a real-valued prediction (e.g., a probability mapped to the interval [−1, 1]), AdaBoost can be modified appropriately (see “Real AdaBoost” in Friedman et al. (2000)).

Boosting Methods The features ,…, are standard independent Gaussian, and the deterministic target Y is defined by Here (0.5) = 9.34 is the median of a chi-squared random variable with 10 degrees of freedom (sum of squares of 10 standard Gaussians).

Boosting Methods

Boosting Fits an Additive Model Boosting is a way of fitting an additive expansion in a set of elementary “basis” functions. Here the basis functions are the individual classifiers Gm(x) {−1, 1}. More generally, basis function expansions take the form

Boosting Fits an Additive Model Typically these models are fit by minimizing a loss function averaged over the training data, such as the squared-error or a likelihood-based loss function,

Forward Stagewise Additive Modeling Forward stagewise modeling approximates the solution to (10.4) by sequentially adding new basis functions to the expansion without adjusting the parameters and coefficients of those that have already been added.

Forward Stagewise Additive Modeling Forward stagewise modeling approximates the solution to (10.4) by sequentially adding new basis functions to the expansion without adjusting the parameters and coefficients of those that have already been added.

Forward Stagewise Additive Modeling For squared-error loss

Exponential Loss and Adaboost We now show that AdaBoost.M1 is equivalent to forward stagewise additive modeling using the loss function

Exponential Loss and Adaboost We now show that AdaBoost.M1 is equivalent to forward stagewise additive modeling using the loss function

Exponential Loss and Adaboost

Exponential Loss and Adaboost

Exponential Loss and Adaboost

Exponential Loss and Adaboost

Why Exponential Loss?

Why Exponential Loss?

Loss Functions and Robustness In this section we examine the different loss functions for classification and regression more closely, and characterize them in terms of their robustness to extreme data.

Loss Functions and Robustness In this section we examine the different loss functions for classification and regression more closely, and characterize them in terms of their robustness to extreme data.

Robust Loss Functions for Classification Also shown in the figure is squared-error loss. The minimizer of the corresponding risk on the population is As before the classification rule is G(x) = sign[f(x)]. Squared-error loss is not a good surrogate for misclassification error.

Robust Loss Functions for Classification Also shown in the figure is squared-error loss. The minimizer of the corresponding risk on the population is As before the classification rule is G(x) = sign[f(x)]. Squared-error loss is not a good surrogate for misclassification error.

Robust Loss Functions for Classification The binomial deviance extends naturally to the K-class multinomial deviance loss function: As in the two-class case, the criterion penalizes incorrect predictions only linearly in their degree of incorrectness.

Robust Loss Functions for Regression Huber loss criterion used for M-regression

Robust Loss Functions for Regression Huber loss criterion used for M-regression

“Off-the-Shelf” Procedures for Data Mining Predictive learning is an important aspect of data mining. As can be seen from this book, a wide variety of methods have been developed for predictive learning from data. For each particular method there are situations for which it is particularly well suited, and others where it performs badly compared to the best that can be done with that data.

“Off-the-Shelf” Procedures for Data Mining Predictive learning is an important aspect of data mining. As can be seen from this book, a wide variety of methods have been developed for predictive learning from data. For each particular method there are situations for which it is particularly well suited, and others where it performs badly compared to the best that can be done with that data.

“Off-the-Shelf” Procedures for Data Mining These requirements of speed, interpretability and the messy nature of the data sharply limit the usefulness of most learning procedures as off-the-shelf methods for data mining. An “off-the-shelf” method is one that can be directly applied to the data without requiring a great deal of timeconsuming data preprocessing or careful tuning of the learning procedure.

Example: Spam Data Applying gradient boosting to these data resulted in a test error rate of 4.5%, using the same test set as was used in Section 9.1.2. By comparison, an additive logistic regression achieved 5.5%, a CART tree fully grown and pruned by cross-validation 8.7%, and MARS 5.5%. The standard error of these estimates is around 0.6%, although gradient boosting is significantly better than all of them using the McNemar test.

Example: Spam Data

Example: Spam Data

Example: Spam Data

Example: Spam Data

Boosting Trees

Boosting Trees

Boosting Trees

Boosting Trees

Boosting Trees

Numerical Optimization via Gradient Boosting

Numerical Optimization via Gradient Boosting

Steepest Descent

Gradient Boosting

Gradient Boosting

Gradient Boosting

Implementations of Gradient Boosting

Right-Sized Trees for Boosting

Right-Sized Trees for Boosting

Right-Sized Trees for Boosting

Right-Sized Trees for Boosting

Right-Sized Trees for Boosting

Regularization

Shrinkage

Shrinkage

Subsampling

Subsampling

Interpretation

Relative Importance of Predictor Variables

Relative Importance of Predictor Variables

Partial Dependence Plots

Partial Dependence Plots

Illustrations California Housing!!!

Illustrations California Housing!!!

Illustrations California Housing!!!

Illustrations California Housing!!!

Illustrations California Housing!!!

Illustrations California Housing!!!

Illustrations California Housing!!!

Illustrations

The End Thanks~