Machine Learning Decision Trees. Exercise Solutions
Exercise 1 a) Machine learning methods are often categorised in three main types: supervised, unsupervised and reinforcement learning methods. Explain these in not more than a sentence each and explain in which category does Decision Tree Learning fall and why?
Answer n Supervised learning is learning with a teacher, i.e. input-output examples are given to the system in the training phase. After training the system is asked to predict the output from new inputs. E.g. classification n Unsupervised learning is in fact learning for structure discovery with no teacher. Only input data are seen in both the training and the testing phase. E.g. ICA, clustering. n Reinforcement learning is learning with no teacher but with feedback from the environment. The feedback consists of rewards, which are typically delayed. E.g. Q-learning. Decision Trees are supervised learning methods.They do classification based on given examples.
c) For the sunbathers example given in the lecture, calculate the Disorder function for the attribute ‘height’ at the root node.
Disorder of height Height is_sunburned Tall Average Short Alex Annie Katie Sarah Emily John Dana Pete
Disorder of height (contd) Alex Annie Katie Sarah Emily John
Exercise 2 n For the sunbathers example given in the lecture, calculate the Disorder function associated with the possible branches of the decision tree once the root node (hair colour) has been chosen.
Answer: 1 st branch Sarah Annie Dana Katie Hair colour is_sunburned Blonde HeightWeightLotion used Short Average Tall SarahAnnie Katie DanaSarah Katie Annie Dana AverageLight No Yes Sarah Annie Dana Katie
n So in this branch (1 st branch) we found the “Lotion Used” is the next attribute to split on n We also found that by doing that this branch is done. n The method of computation for the other 2 branches (red and brown) is exactly the same.
Exercise 3 n Using the decision tree learning algorithm, calculate the decision tree for the following data set
Data for Exercise 3
Ex 3: Search for Root. Candidate: Hair Colour Hair colour Blonde Brown Sarah Annie Dana Julie Ruth Alex Pete John is_sunburned Av Disorder = (5/8)* =
Height is_sunburned Tall Average Short Alex Annie Sarah Julie John Ruth Dana Pete Av Disorder = ¼ + 1/2 * = Ex 3: Search for Root. Candidate: Height
Weight is_sunburned Heavy Average Light Dana Alex Annie Sarah Julie Ruth Pete John Av Disorder = 2*(3/8)* = Ex 3: Search for Root. Candidate: Weight
Lotion used is_sunburned Yes No Dana Alex Sarah Annie Julie Pete John Ruth Av Disorder =(3/4)* = Ex 3: Search for Root. Candidate: Lotion
Ex 3: Next Dana Hair colour Blonde Brown Sarah Annie Dana Julie Ruth No is_sunburned Height Weight Lotion used ? ? ? Short Av Tall LightAv Heavy No Yes Annie Sarah Julie Ruth Dana Sarah Julie Ruth Dana Annie No Sarah Annie Julie Ruth
Ex 3: Next Hair colour Blonde Brown No is_sunburned Height Short Av Tall Yes No Sarah Julie Ruth No further split will improve the classification accuracy on the training data. We can assign a decision to this leaf node based on the majority. That gives a ‘No’.