Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI
Constructing a decision tree ? Which attribute to use as the root node? That is, which attribute to check first when making a prediction? Pick the attribute that brings us closer to a decision. That is, the attribute that splits the data more homogenously.
Which attribute splits the data more homogenously? [0,1,3] [2,1,2] [3,1,1] bad unknown good [3,3,2] [2,1,4] low high [3,2,6] [2,1,0] none adequate [0,0,4] [0,2,2] [5,1,0] >35 low moderate high Goal: Assign a unique number to each attribute that represents how well it “splits” the dataset according to the target attribute Target
For example … What function f to use? f([0,1,3],[2,1,2],[3,1,1]) = number Possible f functions: Gini Index measure of impurity Entropy from information theory Misclassification error metric used by OneR [0,1,3] [2,1,2] [3,1,1] bad unknown good
Using entropy as the f metric f([0,1,3],[2,1,2],[3,1,1]) = Entropy([0,1,3],[2,1,2],[3,1,1]) = (4/14)*Entropy([0,1,3]) + (5/14)*Entropy([2,1,2]) + (5/14)*Entropy([3,1,1]) = (4/14)*[-0 -1/4 log 2 (1/4) -3/4 log 2 (3/4) ] + (5/14)*[-2/5 log 2 (2/5)-1/5 log 2 (1/5) -2/5 log 2 (2/5) ] + (5/14)*[-3/5 log 2 (3/5)-1/5 log 2 (1/5) -1/5 log 2 (1/5) ] = [0,1,3] [2,1,2] [3,1,1] bad unknown good In general: Entropy([p,q,…,z]) = - (p/m)log 2 (p/m) – (q/m)log 2 (q/m) - … - (z/m)log 2 (z/m) where m = p+q+…+z
Which attribute splits the data more homogenously? [0,1,3] [2,1,2] [3,1,1] bad unknown good [3,3,2] [2,1,4] low high [3,2,6] [2,1,0] none adequate [0,0,4] [0,2,2] [5,1,0] >35 low moderate high Attribute with lowest entropy is chosen: income Target
Constructing a decision tree ? Which attribute to use as the root node? That is, which attribute to check first when making a prediction? Pick the attribute that brings us closer to a decision. That is, the attribute that splits the data more homogenously.
Constructing a decision tree income > 35 prediction: high ? ? ?
Splitting instances with income = [0,0,1], [0,1,1],[0,1,0] [0,1,0], [0,1,2] [0,2,2], [0,0,0] entropy: <- high <- moderate attribute with lowest entropy
Constructing a decision tree income > 35 prediction: high … … … Credit- history