Decision Tree Lab
Load in iris data: Display iris data as a sanity check: iris Load package rpart. Install if necessary
We will use fit() to build tree First: understand arguments to fit() – fit(formula, data=, method, control=) – formula: outcome ~ predictor1 + predictor2+… – data: specifies the dataframe – method: “class” for classification tree – control: optional parameters for controlling tree growth
In the case of the iris dataset – formula: Species ~ Petal.Length + Petal.Width + Sepal.Length + Sepal.Width – data = iris – method=“class”
In the case of the iris dataset – control=rpart.control(minsplit=2, cp=0.001) i.e. at least 2 observation in a node must improve overall fit by a factor of (cost complexity)
Altogether: fit = rpart(Species ~ Petal.Length + Petal.Width + Sepal.Length + Sepal.Width, method = "class", data=iris, control =rpart.control(minsplit=2, cp=0.001)) Examine decision tree: print(fit)
Plot decision tree: plot(fit, uniform=TRUE, main="Classification Tree for Iris Dataset") Label the tree: text(fit, use.n=TRUE, all=TRUE, cex=.7)