DECISION TREE Ge Song
Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph: is a graph that uses a branching method to illustrate every possible outcome of a decision. input output Each internal node: test one discrete-valued attribute Xi Each branch from a node: selects one value for Xi Each leaf node: predict Y (or P(Y|X ∈ leaf))
ID3 Algorithm ■Calculate the entropy of every attribute using the data set ■Split the set into subsets using the attribute for which entropy is minimum (or, equivalently, information gain is maximum) ■Make a decision tree node containing that attribute ■Recurse on subsets using remaining attributes.
Result and Limitations ■ Result: [18+/13-] H01420 = yes: [2+/12-] | D12765 = yes: [2+/0-] | D12765 = no: [0+/12-] H01420 = no: [16+/1-] | D14662 = yes: [0+/1-] | D14662 = no: [16+/0-] testError: 0.4 ■ Limitation: 1) The rule (average max and min) that converts attribute values to binary values; 2) The threshold (0.1) that used to check whether to split or not ; 3) The depth of the tree and the robustness; 4) The cross validation step.