Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Statistics & R, TiP, 2011/12 Tree-Based Methods  Methods for analyzing problems of discrimination and regression  Classification & Decision Trees For.

Similar presentations


Presentation on theme: "1 Statistics & R, TiP, 2011/12 Tree-Based Methods  Methods for analyzing problems of discrimination and regression  Classification & Decision Trees For."— Presentation transcript:

1 1 Statistics & R, TiP, 2011/12 Tree-Based Methods  Methods for analyzing problems of discrimination and regression  Classification & Decision Trees For factor outcomes  Regression Trees For continuous outcomes  Difference from other methods is in effective display and intuitive appeal

2 2 Statistics & R, TiP, 2011/12 Classification Trees  Aim is to find a rule for classifying cases  Use a step-by-step approach (one variable at a time)  Aim is to produce a rule for classifying objects into categories  Similar problems of evaluation of performance high dimensions and complicated rules give over-optimistic performance

3 3 Statistics & R, TiP, 2011/12 Example: Iris data 1 st : divide on petal length:- If petal length < 2.5 then type “S” 2 nd : petal width If width < 1.75 then most of type “C” If width >1.75 then “V” If length 4.95 then “V”

4 4 Statistics & R, TiP, 2011/12 Is petal width < 1.75? no Type “V” yes Is petal length < 4.95? yesno Type “S” Is petal length < 2.5? yes no Type “C” Type “V” Can display this as a tree:

5 5 Statistics & R, TiP, 2011/12 > library(tree) > ir.tr<- + tree(ir.species~ir) > plot(ir.tr) > text(ir.tr, + all=T,cex=0.8) Note call to library(tree ) addition of labels with text() cex controls character size

6 6 Statistics & R, TiP, 2011/12  Note misclassification rate with this tree is 4/150 or correct rate is 146/150  Compare LDA of 147/150  Could look at cross-validation method Special routine tree.cv(.)  Could permute labels  Note we can grow tree on a random sample of data and then use it to classify new data (as with lda )

7 7 Statistics & R, TiP, 2011/12 > irsamp.tr ir.pred<-predict(irsamp.tr, + ir[-samp,],type="class") > table(ir.pred,ir.species[-samp]) irpred c s v c 24 0 0 s 0 25 0 v 1 0 25  So correct classification rate of 74/75

8 8 Statistics & R, TiP, 2011/12  Other facilities snip.tree(.)  Interactive chopping of tree to remove unwanted branches  Works in similar way to identify()  Try help(snip.tree)  library(help=tree) for list of all facilities in library tree  Also library(rpart)

9 9 Statistics & R, TiP, 2011/12  Similar Methods  Decision trees Essentially the same as classification trees See shuttle example  Regression trees Continuous outcome to be predicted from explanatory independent variables Can be continuous ordered factors multiple unordered categories  Continuous outcome is made ‘discrete’ makes it similar to classification trees

10 10 Statistics & R, TiP, 2011/12 > cpus.tr<- + tree(log(perf)~.,cpus[,2:8]) > plot(cpus.tr) > text (cpus.tr,cex=1.0) Gives a quick way of predicting performance from properties e.g. machine with cach=25 nmax= 7500 syct=300 chmin=6.0

11 11 Statistics & R, TiP, 2011/12  Comments on mathematics  PCA and lda have rigorous mathematical foundation  Obtained from applications of general statistical theory  Results similar to Neyman-Pearson Lemma etc., etc.  Tree-Based Methods WORK in practice  algorithmic basis instead of mathematical  Give good results in some cases when classical methods are less satisfactory

12 12 Statistics & R, TiP, 2011/12  Summary  Classification & Regression Trees Take one variable at a time Facilities for cross-validation and randomization Variables can be continuous or ordered or unordered factors Facilities for interactive pruning Can be problems with high dimensions and small numbers of cases Theoretical foundation is algorithmic not mathematical They can WORK in practice


Download ppt "1 Statistics & R, TiP, 2011/12 Tree-Based Methods  Methods for analyzing problems of discrimination and regression  Classification & Decision Trees For."

Similar presentations


Ads by Google