Random Forests Feb., 2016 Roger Bohn Big Data Analytics 1.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Random Forest Predrag Radenković 3237/10
CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Model Assessment, Selection and Averaging
Chapter 7 – Classification and Regression Trees
CMPUT 466/551 Principal Source: CMU
Chapter 7 – Classification and Regression Trees
Sparse vs. Ensemble Approaches to Supervised Learning
Decision Tree Algorithm
Ensemble Learning: An Introduction
Three kinds of learning
End of Chapter 8 Neil Weisenfeld March 28, 2005.
ICS 273A Intro Machine Learning
Reduce Instrumentation Predictors Using Random Forests Presented By Bin Zhao Department of Computer Science University of Maryland May
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Classification and Prediction: Regression Analysis
Ensemble Learning (2), Tree and Forest
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Ensembles of Classifiers Evgueni Smirnov
Overview DM for Business Intelligence.
Issues with Data Mining
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
Chapter 9 – Classification and Regression Trees
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
Learning from Observations Chapter 18 Through
Scaling up Decision Trees. Decision tree learning.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Konstantina Christakopoulou Liang Zeng Group G21
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Data Analytics CMIS Short Course part II Day 1 Part 3: Ensembles Sam Buttrey December 2015.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
Decision tree and random forest
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Bagging and Random Forests
Chapter 18 From Data to Knowledge
Eco 6380 Predictive Analytics For Economists Spring 2016
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
ECE 5424: Introduction to Machine Learning
Decision Trees By Cole Daily CSCI 446.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Classification with CART
Recitation 10 Oznur Tastan
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Random Forests Feb., 2016 Roger Bohn Big Data Analytics 1

Harold Colson on good library data catalogs Google Scholar Web of Science Business Source Complete INSPEC ACM Digital Library IEEE Xplore PubMed See page 2

Random Forests (DMRattle+R) Build many decision trees (e.g., 500). For each tree: Select a random subset of the training set (N); Choose different subsets of variables for each node of the decision tree (m << M); Build the tree without pruning (i.e., overfit) Classify a new entity using every decision tree: Each tree “votes” for the entity. The decision with the largest number of votes wins! The proportion of votes is the resulting score. Outcome is a pseudo probability. 0 ≤ prob ≤ 1 3

RF on weather data 4

“Model” is 100s of small Trees Each tree is quick to solve, so computationally tractable Example model from RF ## Tree 1 Rule 1 Node 30 Decision No ## ## 1: Evaporation <= 9 ## 2: Humidity3pm <= 71 ## 3: Cloud3pm <= 2.5 ## 4: WindDir9am IN ("NNE") ## 5: Sunshine <= ## 6: Temp3pm <= Final decision (yes/no, or level) just like single tree 5

Error rates. 6

Properties of RFs Often works better than other methods. Runs efficiently on large data sets. Can handle hundreds of input variables. Gives estimates of variable importance. Results easy to use, but too complex to summarize (“black box”) Cross-validation is built in: Use random set of observations for each tree. (With replacement.) Omitted observations are the validation set for that tree. 7

8

R code randomForest is one RF program. There are others. ds <- weather[train, -c(1:2, 23)] form <- RainTomorrow ~. m.rp <- rpart(form, data=ds) m.rf <- randomForest(form, data=ds, na.action=na.roughfix, importance=TRUE) 9 randomForest(x, y=NULL, xtest=NULL, ytest=NULL, ntree=500, mtry=if (!is.null(y) && !is.factor(y)) max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))), replace=TRUE, classwt=NULL, cutoff, strata, sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)), nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1, maxnodes = NULL, importance=FALSE, localImp=FALSE, nPerm=1, proximity, oob.prox=proximity, norm.votes=TRUE, do.trace=FALSE, keep.forest=!is.null(y) && is.null(xtest), corr.bias=FALSE, keep.inbag=FALSE,...)

Mechanics of RFs Each model uses random bag of observations ~70/30 Each time a split in a tree is considered, random selection of m predictors chosen as candidates from the full set of p predictors. The split chooses one of those m predictors, just like a single tree. A fresh selection of m predictors is taken at each split. Typically we choose m ≈ √p Number of predictors considered at each split is approximately the square root of total number of predictors. max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))), If tree is deep, most of the p variables get considered at least once. 10

11

Mechanics: combining trees Run RF 500 times, get 500 models. Check this! With many variables you may need more trees. Final prediction or classification is based on voting Usually use unweighted voting: all trees equal Can weight the votes e.g. most successful trees get highest weights. For classification: majority of trees determines classification For prediction problems (continuous outcomes): Average prediction of all the trees becomes the RF’s prediction. 12

Case study: Comparing methods 13 From: Matt Taddy, Chicago Booth School faculty.chicagobooth.edu/matt.taddy/teaching

Single tree result 14

15

16

17

18

Other concepts using trees 19

Generalize: Groups of different models! Many models are better than any 1 model Each model better at classifying some situations. “Boosting” algorithms 20

21

Comparing algorithms PropertySingle tree Random forestLogistic /regression LASSO Nonlinear relationships? Good Must pre- guess interactions same Explain to audience? GoodGood (most audiences) Selecting variables (large p) Variable importance Handle continuous outcomes (predict) Handle discrete outcomes (classify) Number of OTSUs 22

Comparing algorithms PropertySingle tree Random forestLogistic /regression LASSO Nonlinear relationships? Good Must pre- guess interactions same Explain to audience? Very good PoorVery good if trained Medium Selecting variables (large p) DecentGoodPoorVery good Variable importance WeakRelative importance Absolute importance Same Handle continuous outcomes (predict) Yes Handle discrete outcomes (classify) Yes Number of OTSUsWho are we kidding? All have plenty of OTSUs. Hence importance of validation, then test 23