Download presentation
Presentation is loading. Please wait.
Published byClare Singleton Modified over 9 years ago
1
Exercises Decision Trees In decision tree learning, the information gain criterion helps us select the best attribute to split the data at every node. Information gain is briefly described as: Gain(S,A) = Entropy(S) - Average weighted entropy induced by A. Notice that the first term “Entropy(S)” is the same for all attributes. If this is the case, why do we need it? Why not simply choose the attribute that maximizes the second term (or minimizes average weighted entropy induced by A)?
2
Exercise 2 In decision tree learning we assign one attribute to each internal node of the tree, normally by choosing the one attribute with maximum value for a certain quality metric (e.g., information gain or gain ratio). Assume you have only binary attributes (Boolean) and that you have been asked to modify the mechanism of decision trees by assigning two attributes instead of one at each internal node. Each pair of attributes will be joined by logical operator AND. For example, let's assume we have three attributes A1, A2, and A3. Our candidates for a tree node are A1&A2, or A1&A3, or A2&A3. Answer the following questions: How many branches would come out of each internal node? Can we use information gain or gain ratio to choose the best pair of attributes (i.e., conjunction of attributes)? Explain.
3
In decision tree learning we assign one attribute to each internal node of the tree, normally by choosing the one attribute with maximum value for a certain quality metric (e.g., information gain or gain ratio). Assume you have only binary attributes (Boolean) and that you have been asked to modify the mechanism of decision trees by assigning two attributes instead of one at each internal node. Each pair of attributes will be joined by logical operator AND. For example, let's assume we have three attributes A1, A2, and A3. Our candidates for a tree node are A1&A2, or A1&A3, or A2&A3. Answer the following questions: How many branches would come out of each internal node? Can we use information gain or gain ratio to choose the best pair of attributes (i.e., conjunction of attributes)? Explain.
4
Question Part 1 The first term is important because it indicates how much we are reducing entropy before spitting the data. If we only use the second term we miss relevant information: if the difference between the first and second term is very small then it is not worth splitting the data any further.
5
Question Part 2 How many branches would come out of each internal node? Answer: 2 Can we use information gain or gain ratio to choose the best pair of attributes (i.e., conjunction of attributes)? Explain. Answer: Yes, each conjunction would stand as a new feature with two values. Both metrics are perfectly valid in this setting.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.