Regression Tree Learning Gabor Melli July 18 th, 2013.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
1 Regression and the Bias-Variance Decomposition William Cohen April 2008 Readings: Bishop 3.1,3.2.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Tree Rong Jin. Determine Milage Per Gallon.
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Tree-based methods, neutral networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Tree Pruning Methods Validation set – withhold a subset (~1/3) of training data to use for pruning –Note: you should randomize the order of training.
Classification.
Ensemble Learning (2), Tree and Forest
Decision Tree Learning
Tree-Based Methods (V&R 9.1) Demeke Kasaw, Andreas Nguyen, Mariana Alvaro STAT 6601 Project.
Data Mining: Classification
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Mohammad Ali Keyvanrad
Chapter 9 – Classification and Regression Trees
Classification and Regression Trees (CART). Variety of approaches used CART developed by Breiman Friedman Olsen and Stone: “Classification and Regression.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Machine Learning Lecture 10 Decision Tree Learning 1.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Classification and Regression Trees
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Decision Tree Pruning problem of overfit approaches
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Introduction to Machine Learning and Tree Based Methods
Ch9: Decision Trees 9.1 Introduction A decision tree:
Machine Learning: Decision Tree Learning
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees By Cole Daily CSCI 446.
R & Trees There are two tree libraries: tree: original
Statistical Learning Dong Liu Dept. EEIS, USTC.
Classification with CART
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 2nd Edition
Presentation transcript:

Regression Tree Learning Gabor Melli July 18 th, 2013

Overview What is a regression tree? How to train a regression tree? How to train one with R’s rpart()? How to train one with BigML.com?

Familiar with Classification Trees?

What is a Regression Tree? a trained predictor tree that is a regressed point estimation function (where each leaf node and typically also internal nodes makes a point estimate).trained predictor treeregressed point estimation functionleaf nodepoint estimate If test 1 test

Approach: recursive top-down greedy Avg=14 Err=0.12 Avg=87 Error=0.77 x<1.54 then z=14 else z=87

Divide the sample space with orthogonal hyperplanes Mean=27 error=0.19 Mean=161 Error=0.23 x<1.93 then 27 else 161

Approach: recursive top-down greedy Avg=54 Err=0.92 Avg=61 Error=0.71

Divide the sample space with orthogonal hyperplanes err=0.12 err=0.09

Divide the sample space with orthogonal hyperplanes

Regression Tree (sample)

Stopping Criterion If all records have the same target value. If there are fewer than n records in set.

Example

R Code library(rpart); # Load the data synth_epc <- read.delim("synth_epc.tsv") ; attach(synth_epc) ; # Train the decision trees synth_epc.rtree <- rpart(epcw0 ~ merch + user + epcw1 + epcw2, synth_epc[,1:5], cp=0.01) ;

# Display the tree plot(synth_epc.rtree, uniform=T, main=" EPC Regression Tree "); text(synth_epc.rtree, digits=3) ;

synth_epc.rtree ; 1) root ) epcw1< ) epcw1< * 5) epcw1>= ) user=userC * 11) user=userA,userB,userD,userE,userF,userG,userH,userI,userJ,userK * 3) epcw1>= ) user=userC * 7) user=userA,userB,userD,userE,userF,userG,userH,userI,userJ,userK ) epcw1< ) epcw1< * 29) epcw1>= ) user=userB * 59) user=userA,userD,userE,userF,userG,userH,userI,userJ,userK * 15) epcw1>= ) user=userB,userI * 31) user=userA,userD,userE,userF,userG,userH,userJ,userK *

BigML.com

Java class output /* Predictor for epcw0 from model/51ef7f9e035d07603c00368c * Predictive model by BigML - Machine Learning Made Easy */ public static Double predictEpcw0(String user, Double epcw2, Double epcw1) { if (epcw1 == null) { return D; } else if (epcw1 <= 0.165) { if (epcw1 > 0.095) { if (user == null) { return D; } else if (user.equals("userC")) { return 0D; …

PMML output | …

Pruning # Prune and display tree synth_epc <- prune(synth_epc,cp=0.0055)

Determine the Best Complexity Parameter (cp) Value for the Model CP nsplit rel error xerror xstd – R 2 Cross- Validated Error cp X-val Relative Error Inf size of tree # Splits Complexity Parameter Cross- Validated Error SD

We can see that we need a cp value of about to give a tree with 11 leaves or terminal nodes

Reduced-Error Pruning A post-pruning, cross validation approach – Partition training data into “grow” set and “validation” set. – Build a complete tree for the “grow” data – Until accuracy on “validation” set decreases, do: For each non-leaf node in the tree – Temporarily prune the tree below; replace it by majority vote. – Test the accuracy of the hypothesis on the validation set – Permanently prune the node with the greatest increase in accuracy on the validation test. Problem: Uses less data to construct the tree Sometimes done at the rules level – Rules are generalized by erasing a condition (different!) General Strategy: Overfit and Simplify

Regression Tree Pruning Regression Tree Before Pruning | cach< 27 mmax< 6100 mmax< 1750 mmax< 2500 chmax< 4.5 syct< 110 syct>=360 chmin< 5.5 cach< 0.5 chmin>=1.5 mmax< 1.4e+04 mmax< 2.8e+04 cach< 96.5 mmax< 1.124e+04 chmax< 14 cach< Regression Tree After Pruning | cach< 27 mmax< 6100 mmax< 1750 syct>=360 chmin< 5.5 cach< 0.5 mmax< 2.8e+04 cach< 96.5 mmax< 1.1e+04 cach<

How well does it fit? Plot of residuals

Testing w/Missing Values

THE END

31 Regression trees: example - 1

R Code library(rpart); library(MASS); data(cpus); attach(cpus) # Fit regression tree to data cpus.rp <-rpart(log(perf)~.,cpus[,2:8],cp=0.001) # Print and plot complexity Parameter (cp) table printcp(cpus.rp); plotcp(cpus.rp) # Prune and display tree cpus.rp<-prune(cpus.rp,cp=0.0055) plot(cpus.rp,uniform=T,main="Regression Tree") text(cpus.rp,digits=3) # Plot residual vs. predicted plot(predict(cpus.rp),resid(cpus.rp)); abline(h=0)

Create a new tree T with a single root node. IF One of the Stopping Criteria is fulfilled THEN – Mark the root node in T as a leaf with the most common value of y in S as a label. ELSE – Find a discrete function f(A) of the input attributes values such that splitting S according to f(A)’s outcomes (v1,...,vn) gains the best splitting metric. – IF best splitting metric > treshold THEN Label t with f(A) FOR each outcome vi of f(A): – Set Subtreei= TreeGrowing (¾f(A)=viS,A,y). – Connect the root node of tT to Subtreei with an edge that is labelled as vi END FOR – ELSE Mark the root node in T as a leaf with the most common value of y in S as a label. – END IF END IF RETURN T

Create a new tree T with a single root node. IF One of the Stopping Criteria is fulfilled THEN – Mark the root node in T as a leaf with the most common value of y in S as a label. ELSE – Find a discrete function f(A) of the input attributes values such that splitting S according to f(A)’s outcomes (v1,...,vn) gains the best splitting metric. – IF best splitting metric > treshold THEN Label t with f(A) FOR each outcome vi of f(A): – Set Subtreei= TreeGrowing (¾f(A)=viS,A,y). – Connect the root node of tT to Subtreei with an edge that is labelled as vi END FOR – ELSE Mark the root node in T as a leaf with the most common value of y in S as a label. – END IF END IF RETURN T

Measures used in fitting Regression Tree Instead of using the Gini Index the impurity criterion is the sum of squares, so splits which cause the biggest reduction in the sum of squares will be selected. In pruning the tree the measure used is the mean square error on the predictions made by the tree.

37 Regression trees - summary Growing tree: – Split to optimize information gain At each leaf node – Predict the majority class Pruning tree: – Prune to reduce error on holdout Prediction: – Trace path to a leaf and predict associated majority class build a linear model, then greedily remove features estimates are adjusted by (n+k)/(n-k): n=#cases, k=#features estimated error on training data using to a linear interpolation of every prediction made by every node on the path [Quinlan’s M5]