Presentation is loading. Please wait.

Presentation is loading. Please wait.

(classification & regression trees)

Similar presentations


Presentation on theme: "(classification & regression trees)"— Presentation transcript:

1 (classification & regression trees)
Classification trees CART (classification & regression trees)

2 Tree algorithm- binary recursive partitioning
The classification tree or regression tree method examines all the predictors (Xs) and finds the best value of the best predictor that splits the data into two groups (nodes) that are as different as possible on the outcome. This process is independently repeated in the two “daughter” nodes created by the split until either the final (“terminal”) nodes are homogeneous (all observations are the same value, sample size is too small (default: n <5) or the difference between the two nodes is not statistically significant.

3 Ex: Kidwell -Hemorrhage stroke
data, n=89, Y=54 with no hemorrhage, 35 with hemorrhage (binary outcome Y) Potential predictors: Glucose level Platelet count Hematocrit Time to recanalization Coumadin used (y/n) NIH stroke score (0-42) Age Sex Weight SBP DBP Diabetes

4

5 Classification Matrix
Hemorrhage model Classification Matrix Actual Category no yes total Predicted 45 11 56 Category 9 24 33 54 35 89 Sensitivity= 24/35 =69% Specificity= 45/54 =83% Accuracy=76% C = R2 (Hosmer) = 0.606 df=89-5= 84 Deviance/df=72.3/84= 0.861

6 JMP tree (“partition”) details
JMP reports chi-square (G2) values for each node. The chi square for testing the significance of a split is computed as G2 test = G2 parent - (G2 left + G2 right). JMP reports the “Log Worth” rather than the p value (common in genetics) for G2 test Log Worth = - log10(p value) G2 test p value Log Worth Titanic example

7 JMP partition details (cont)
Can automate making the tree with “k fold cross validation” option (k=5 by default). Choose this option, then select “go”. May need to “prune” tree (remove non significant terminal nodes).

8 Tree from Titanic data- Y=survival

9 Titanic tree fit stats ROC Area 0.8095

10 Titanic tree – fit stats
Measure Training Definition Entropy RSquare 0.3042 1-Loglike(model)/Loglike(0) Generalized RSquare 0.4523 (1-(L(0)/L(model))^(2/n))/(1-L(0)^(2/n)) Mean -Log p 0.4628 ∑ -Log(ρ[j])/n RMSE 0.3842 √ ∑(y[j]-ρ[j])²/n Mean Abs Dev 0.2956 ∑ |y[j]-ρ[j]|/n Misclassification Rate 0.2116 ∑ (ρ[j]≠ρMax)/n n 1309 Survived Predicted Actual No Yes Total Acc 774 35 809 95.7% 242 258 500 51.6%


Download ppt "(classification & regression trees)"

Similar presentations


Ads by Google