Download presentation
Presentation is loading. Please wait.
Published byAlice Owens Modified over 9 years ago
1
Decision Trees
2
MS Algorithms
3
Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to the model every time an input column is found to be significantly correlated with the predictable column.
4
Predicting Discrete Columns
5
More on Decision Trees Each internal node (including the root) represent a test Each leaf nodes represent a class. age Yes 31..40 credit > 40 Yes No Fair excellent Student Yes No y n <= 30
6
Some facts Probably the most popular data mining algorithm We have been using it without knowing it A path from the root to a leaf note forms a rule Predication is efficient Shapes and sizes can be controlled C4.5 can handle numeric attributes, missing values, and noisy data MS called their algorithm Decision Tree s because Combines many different algorithms The model may generate many trees Predicate a nest column Predicate a many columns Predicate a continuously column
7
Growing the tree 1. Correlation on each attribute over the prediction For example, IQ can be H, M, and L Each with a count for attending college or not 2. Select a internal node based on, say, entropy calculation. 3. Recursively work on each possible branch until all the attributes are considered
8
Entropy The entropy concept was developed from the study of thermodynamic systems It states that for any irreversible process, entropy always increases. Entropy is a measure of disorder. So the second law states that any irreversible process, the disorder in the universe increases. So the smaller the entropy, the better. Since virtually all natural processes are irreversible, the entropy law implies that the universe is "running down". Order, patterns, structure, all gradually disintegrate into random disorder. The direction of time is from order to chaos.
9
Characteristic If a case determines, it has a value of zero If the in and out states are equal, it returns the max In the case of multiple states, different calculation should result in the same In the IQ H, M, L case, you can start with H and Not H, then M or L, or H, M, L, the result should be the same.
10
Steps 1. Build a correlation count table 2. Calculate entropy (or other measurement)
11
Examples in book What the book meant by Entropy (700, 400) Check all Why pick the one with the lowest entropy? 4000.363636-1.45943-0.5307 7000.636364-0.65208-0.41496 1100 0.94566
12
Issue with attribute of many states Zip code has many state Ignore Keep the same so the tree is not very good Group Locations Characteristics Population Beach access Economic
13
Over Training The size of the tree has no direct relation to the quality of the prediction A big tree sometime only reflects the training data – this is called over training and should be avoided
14
Parameters ParameterDescription MAXIMUM_INPUT_ATTRIBUTESDefines the number of input attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection. The default is 255. MAXIMUM_OUTPUT_ATTRIBUTESDefines the number of output attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection. The default is 255. SCORE_METHODDetermines the method that is used to calculate the split score. Available options: Entropy (1), Bayesian with K2 Prior (2), or Bayesian Dirichlet Equivalent (BDE) Prior (3). The default is 3. SPLIT_METHODDetermines the method that is used to split the node. Available options: Binary (1), Complete (2), or Both (3). The default is 3. MINIMUM_SUPPORTDetermines the minimum number of leaf cases that is required to generate a split in the decision tree. The default is 10. COMPLEXITY_PENALTYControls the growth of the decision tree. A low value increases the number of splits, and a high value decreases the number of splits. The default value is based on the number of attributes for a particular model, as described in the following list: For 1 through 9 attributes, the default is 0.5. For 10 through 99 attributes, the default is 0.9. For 100 or more attributes, the default is 0.99. FORCED_REGRESSORForces the algorithm to use the indicated columns as regressors, regardless of the importance of the columns as calculated by the algorithm. This parameter is only used for decision trees that are predicting a continuous attribute.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.