CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)

Slides:



Advertisements
Similar presentations
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Advertisements

Data Mining and Machine Learning
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Sparse vs. Ensemble Approaches to Supervised Learning
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Ensemble Learning (2), Tree and Forest
Machine Learning CS 165B Spring 2012
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
CS 391L: Machine Learning: Ensembles
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Scaling up Decision Trees. Decision tree learning.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.
CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)
Konstantina Christakopoulou Liang Zeng Group G21
Kaggle Competition Prudential Life Insurance Assessment
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
Data Analytics CMIS Short Course part II Day 1 Part 3: Ensembles Sam Buttrey December 2015.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
CS 189 Brian Chu Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge) brianchu.com.
Kaggle Competition Rossmann Store Sales.
Ensemble Methods for Machine Learning. COMBINING CLASSIFIERS: ENSEMBLE APPROACHES.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
CS 189 Brian Chu Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge) brianchu.com.
Kaggle competition Airbnb Recruiting: New User Bookings
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Object Classification through Deconvolutional Neural Networks
Ensembles (Bagging, Boosting, and all that)
COMP61011 : Machine Learning Ensemble Models
Basic machine learning background with Python scikit-learn
ECE 5424: Introduction to Machine Learning
Asymmetric Gradient Boosting with Application to Spam Filtering
A “Holy Grail” of Machine Learing
Data Mining Practical Machine Learning Tools and Techniques
Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.
Lecture 18: Bagging and Boosting
Ensemble learning.
Lecture 06: Bagging and Boosting
Model Combination.
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
CS 391L: Machine Learning: Ensembles
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Ensembles (Bagging, Boosting, and all that)
Presentation transcript:

CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)

Agenda Random forests Bias vs. variance revisited Worksheet

HW Tip Random forests are “embarrassingly parallel” Python multiprocessing Spam class 0 frequency: 0.71

Random forests Why do we use bootstrap? De-correlate trees (reduce variance) "Sampling with replacement behaves on the original sample the way the original sample behaves on a population”

Bias vs. variance revisited Decision trees with long depth are very prone to overfit  low bias, high variance Decision “stump” with a max depth of 2 does not overfit, not complex enough  high bias, low variance

Bias vs. variance revisited Random forest: take a bunch of low bias, high variance trees, try to lower the variance – Bias is already low, don’t worry about it, attack variance – (by parallel training with randomization, then taking majority vote) – randomization attacks the variance Boosting: train a bunch of high bias, low variance learners, try to lower the bias – Variance is already low, don’t worry about it, attack bias – (by sequential training with re-weighting, then finding weighted average classification) – re-weighting attacks the bias boosting can be used with any learner, ideally a weak learner (common variant: linear SVMs)

Random forests and boosting Both are “ensemble” methods Both are among the most widely used ML algorithms in industry (the standard for fraud/spam detection) – neural nets not used for fraud/spam type tasks. In practice: random forests work better out-of- the-box (less tuning). But with tuning, boosting usually performs better. Most classification Kaggle competitions won by: 1) boosting, or 2) neural nets

Cool places RF/Boosting is used effective-boosting-methods/answer/Tao-Xu (boosting) effective-boosting-methods/answer/Tao-Xu yPartRecognition.pdf (kinect, RF) yPartRecognition.pdf forest-classifier/ (RF) forest-classifier/ k.pdf (boosting + logistic reg.) k.pdf Twitter, etc.

Next time: NEURAL NETWORKS