CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)

Slides:

Advertisements

Similar presentations

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.

Advertisements

Data Mining and Machine Learning

CMPUT 466/551 Principal Source: CMU

Longin Jan Latecki Temple University

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning what is an ensemble? why use an ensemble?

A Brief Introduction to Adaboost

Ensemble Learning: An Introduction

Sparse vs. Ensemble Approaches to Supervised Learning

General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.

Ensemble Learning (2), Tree and Forest

Machine Learning CS 165B Spring 2012

Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.

Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.

CS 391L: Machine Learning: Ensembles

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3

Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.

BOOSTING David Kauchak CS451 – Fall Admin Final project.

Scaling up Decision Trees. Decision tree learning.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensemble Methods: Bagging and Boosting

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.

CLASSIFICATION: Ensemble Methods

BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.

Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.

Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.

CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)

Konstantina Christakopoulou Liang Zeng Group G21

Kaggle Competition Prudential Life Insurance Assessment

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.

Classification Ensemble Methods 1

COMP24111: Machine Learning Ensemble Models Gavin Brown

Data Analytics CMIS Short Course part II Day 1 Part 3: Ensembles Sam Buttrey December 2015.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.

Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.

CS 189 Brian Chu Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge) brianchu.com.

Kaggle Competition Rossmann Store Sales.

Ensemble Methods for Machine Learning. COMBINING CLASSIFIERS: ENSEMBLE APPROACHES.

Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

CS 189 Brian Chu Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge) brianchu.com.

Kaggle competition Airbnb Recruiting: New User Bookings

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning: Ensemble Methods

Object Classification through Deconvolutional Neural Networks

Ensembles (Bagging, Boosting, and all that)

COMP61011 : Machine Learning Ensemble Models

Basic machine learning background with Python scikit-learn

ECE 5424: Introduction to Machine Learning

Asymmetric Gradient Boosting with Application to Spam Filtering

A “Holy Grail” of Machine Learing

Data Mining Practical Machine Learning Tools and Techniques

Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.

Lecture 18: Bagging and Boosting

Ensemble learning.

Lecture 06: Bagging and Boosting

Model Combination.

Derek Hoiem CS 598, Spring 2009 Jan 27, 2009

CS 391L: Machine Learning: Ensembles

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Ensembles (Bagging, Boosting, and all that)

Presentation transcript:

CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)

Agenda Random forests Bias vs. variance revisited Worksheet

HW Tip Random forests are “embarrassingly parallel” Python multiprocessing Spam class 0 frequency: 0.71

Random forests Why do we use bootstrap? De-correlate trees (reduce variance) "Sampling with replacement behaves on the original sample the way the original sample behaves on a population”

Bias vs. variance revisited Decision trees with long depth are very prone to overfit  low bias, high variance Decision “stump” with a max depth of 2 does not overfit, not complex enough  high bias, low variance

Bias vs. variance revisited Random forest: take a bunch of low bias, high variance trees, try to lower the variance – Bias is already low, don’t worry about it, attack variance – (by parallel training with randomization, then taking majority vote) – randomization attacks the variance Boosting: train a bunch of high bias, low variance learners, try to lower the bias – Variance is already low, don’t worry about it, attack bias – (by sequential training with re-weighting, then finding weighted average classification) – re-weighting attacks the bias boosting can be used with any learner, ideally a weak learner (common variant: linear SVMs)

Random forests and boosting Both are “ensemble” methods Both are among the most widely used ML algorithms in industry (the standard for fraud/spam detection) – neural nets not used for fraud/spam type tasks. In practice: random forests work better out-of- the-box (less tuning). But with tuning, boosting usually performs better. Most classification Kaggle competitions won by: 1) boosting, or 2) neural nets

Cool places RF/Boosting is used effective-boosting-methods/answer/Tao-Xu (boosting) effective-boosting-methods/answer/Tao-Xu yPartRecognition.pdf (kinect, RF) yPartRecognition.pdf forest-classifier/ (RF) forest-classifier/ k.pdf (boosting + logistic reg.) k.pdf Twitter, etc.

Next time: NEURAL NETWORKS