Ensemble Learning (2), Tree and Forest

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Random Forest Predrag Radenković 3237/10
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Model generalization Test error Bias, variance and complexity
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Lecture 3 Nonparametric density estimation and classification
Overview Previous techniques have consisted of real-valued feature vectors (or discrete-valued) and natural measures of distance (e.g., Euclidean). Consider.
Model Assessment, Selection and Averaging
CMPUT 466/551 Principal Source: CMU
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning: An Introduction
Additive Models and Trees
Lecture 5 (Classification with Decision Trees)
Classification 10/03/07.
Prediction Methods Mark J. van der Laan Division of Biostatistics U.C. Berkeley
End of Chapter 8 Neil Weisenfeld March 28, 2005.
ICS 273A Intro Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Comp 540 Chapter 9: Additive Models, Trees, and Related Methods
Machine Learning CS 165B Spring 2012
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CLASSIFICATION: Ensemble Methods
Training of Boosted DecisionTrees Helge Voss (MPI–K, Heidelberg) MVA Workshop, CERN, July 10, 2009.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Supervised learning in high-throughput data  General considerations  Dimension reduction with outcome variables  Classification models.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Bagging and Random Forests
Introduction to Machine Learning and Tree Based Methods
Trees, bagging, boosting, and stacking
Ch9: Decision Trees 9.1 Introduction A decision tree:
ECE 5424: Introduction to Machine Learning
ECE 471/571 – Lecture 12 Decision Tree.
Ungraded quiz Unit 6.
Lecture 05: Decision Trees
Decision Trees By Cole Daily CSCI 446.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Classification with CART
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Ensemble Learning (2), Tree and Forest Classification and Regression Tree Bagging of trees Random Forest

To estimate complex response surface and class boundary. Motivation To estimate complex response surface and class boundary. Reminder: SVM achieves this goal by the kernel trick; Boosting achieves this goal by combining weak classifiers Classification tree achieves this goal by generating complex surface/boundary in a very complex model. Bagging classification tree and random forest achieves this goal in a manner similar as boosting.

Classification tree

Classification Tree An example classification tree.

Classification tree Issues: How many splits should be allowed at a node? Which property to use at a node? When to stop splitting a node and declare it a “leaf”? How to adjust the size of the tree? Tree size <-> model complexity. Too large a tree – over fitting; Too small a tree – not capture the underlying structure. How to assign the classification decision at each leaf? Missing data?

Classification Tree Binary split.

Classification Tree To decide what split criteria to use, need to establish the measurement of node impurity. Entropy: Misclassification: Gini impurity: (Expected error rate if class label is permuted.)

Classification Tree

Classification Tree Growing the tree. Greedy search: at every step, choose the query that decreases the impurity as much as possible. For a real valued predictor, may use gradient descent to find the optimal cut value. When to stop? - Stop when reduction in impurity is smaller than a threshold. - Stop when the leaf node is too small. - Stop when a global criterion is met. - Hypothesis testing. - Cross-validation. - Fully grow and then prune.

Classification Tree Pruning the tree. - Merge leaves when the loss of impurity is not severe. - cost-complexity pruning allows elimination of a branch in a single step. When priors and costs are present, adjust training by adjusting the Gini impurity Assigning class label to a leaf. - No prior: take the class with highest frequency at the node. - With prior: weigh the frequency by prior - With loss function.… Always minimize the Bayes error.

Classification Tree Choice or features.

Classification Tree Multivariate tree.

Classification Tree Example of error rate v.s. tree size.

Regression tree Example: complex surface by CART.

Regression tree Model the response as region-wise constant: Due to computational difficulty, a greedy algorithm: Consider splitting variable j and split point s, Seek j and s that minimize RSS: Simply scan through all j and the range of xj to find s. After partition, treat each region as separate data and iterate.

Regression tree Tree size <-> model complexity. Too large a tree – over fitting; Too small a tree – not capture the underlying structure. How to tune? - Grow the tree until RSS reduction becomes too small. Too greedy and “short sighted”. - Grow until leafs are too small, then prune the tree using cost-complexity pruning.

Directly assess uncertainty from the training data Bootstraping Directly assess uncertainty from the training data Basic thinking: assuming the data approaches true underlying density, re-sampling from it will give us an idea of the uncertainty caused by sampling

Bootstrapping

Bagging “Bootstrap aggregation.” Resample the training dataset. Build a prediction model on each resampled dataset. Average the prediction. It’s a Monte Carlo estimate of , where is the empirical distribution putting equal probability 1/N on each of the data points. Bagging only differs from the original estimate when f() is a non-linear or adaptive function of the data! When f() is a linear function, Tree is a perfect candidate for bagging – each bootstrap tree will differ in structure.

Bagging trees Bagged trees are of different structure.

Bagging trees Error curves.

Bagging trees Failure in bagging a single-level tree.

Random Forest Bagging can be seen as a method to reduce variance of an estimated prediction function. It mostly helps high-variance, low-bias classifiers. Comparatively, boosting build weak classifiers one-by-one, allowing the collection to evolve to the right direction. Random forest is a substantial modification to bagging – build a collection of de-correlated trees. - Similar performance to boosting - Simpler to train and tune compared to boosting

Random Forest The intuition – the average of random variables. B i.i.d. random variables, each with variance The mean has variance B i.d. random variables, each with variance , with pairwise correlation , ------------------------------------------------------------------------------------- Bagged trees are i.d. samples. Random forest aims at reducing the correlation to reduce variance. This is achieved by random selection of variables.

Random Forest

Random Forest Example comparing RF to boosted trees.

Random Forest Example comparing RF to boosted trees.

Random Forest Benefit of RF – out of bag (OOB) sample  cross validation error. For sample i, find its RF error from only trees built from samples where sample i did not appear. The OOB error rate is close to N-fold cross validation error rate. Unlike many other nonlinear estimators, RF can be fit in a single sequence. Stop growing forest when OOB error stabilizes.

Random Forest Variable importance – find the most relevant predictors. At every split of every tree, a variable contributed to the improvement of the impurity measure. Accumulate the reduction of i(N) for every variable, we have a measure of relative importance of the variables. The predictors that appears the most times at split points, and lead to the most reduction of impurity, are the ones that are important. ------------------ Another method – Permute the predictor values of the OOB samples at every tree, the resulting decrease in prediction accuracy is also a measure of importance. Accumulate it over all trees.

Random Forest