Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decision trees MARIO REGIN.

Similar presentations


Presentation on theme: "Decision trees MARIO REGIN."— Presentation transcript:

1 Decision trees MARIO REGIN

2 What is a decision tree? General purpose prediction and classification mechanism Emerged at the same time as the nascent fields of artificial intelligence and statistical computation

3 ... ... Nomenclature Root Node Branch Level Input Level Nodes
  NODE ID:      1                1:   38.2%                0:   61.8%        Count:   1309 Level Female Male Gender   NODE ID:      23                1:   72.7%                0:   27.3%        Count:   466   NODE ID:      24                1:   19.1%                0:   80.9%        Count:   843 Input Level ... ... Nodes

4 Main Characteristic Recursive subsetting of a target field of data according to the values of associated input fields or predictors to create partitions, and associated descendent data subsets, that contain progressively similar intra-node target values and progressively dissimilar inter-node values at any given level of the tree. In other words, decision trees reduce the disorder of the data as we go from higher to lower levels of the tree.

5 Creating a Decision Tree
1.- Create the root node 2.- Search through the data set to discover the best partitioning input for the current node. 3.- Use the best input to partition the current node to form branches. 4.- Repeat steps 2 and 3 until one of more possible stop conditions are met.

6 Best Input The selection of the ‘best’ input field is an open subject of active research. Decision trees allow for a variety of computational approaches to input selection. Some possible Approaches to select the best input: High performance predictive model approach. Relationship analysis approach.

7 High Performance Predictive Model Approach
The input that produces the most separation in the variability among the descendent nodes. Emphasis on the selection of high quality partitions that can collectively produce the best overall model.

8 Relationship Analysis Approach
Guide the branching in order to analyst, discover, support or confirm the conditional relations that are assumed to exist among the various inputs and the component nodes that they produce. Emphasis on analyzing the interaction in the formation of the tree

9 Stopping Rule Generally, stopping rules consist of thresholds on diminishing returns (in terms of test statistics) or in a diminishing supply of training cases (minimum acceptable number of observations in a node).

10 Tree Construction

11 ? Three Construction ? P N H Y N Y A NODE ID: 1 1: 37.5% 0: 62.5%
Cast Shadow Likes Garlic Complexion Vampire ? Yes Pale No Healthy Average   NODE ID:      1                1:   37.5%                0:   62.5%        Count:      8 ? .5 Cast Shadow Likes Garlic Complexion .6 .68 ? P N H Y N Y A N: 2 Y: 2 N: 3 Y: 0 N: 0 Y: 1 N: 3 Y: 0 N: 2 Y: 3 N: 2 Y: 0 N: 1 Y: 2 N: 2 Y: 1

12 Three Construction NODE ID: 1 1: 37.5% 0: 62.5% Count: 8 Gender ? N Y
Cast Shadow Likes Garlic Complexion Vampire ? Yes Pale No Healthy Average   NODE ID:      1                1:   37.5%                0:   62.5%        Count:      8 Gender ? N Y NODE ID:      2              1:   50.0%              0:   50.0%      Count:      4 NODE ID:      3              1:   00.0%              0:  100%      Count:      3 NODE ID:      4              1:   100%              0:   00.0%      Count:      1

13 Three Construction P N H Y A NODE ID: 1 1: 37.5% 0: 62.5% Count: 8
Cast Shadow Likes Garlic Complexion Vampire ? Yes Pale No Healthy Average   NODE ID:      1                1:   37.5%                0:   62.5%        Count:      8 Gender ? N Y NODE ID:      2              1:   50.0%              0:   50.0%      Count:      4 NODE ID:      3              1:   00.0%              0:  100%      Count:      3 NODE ID:      4              1:   100%              0:   00.0%      Count:      1 Likes Garlic Complexion .5 P N H Y A N: 2 Y: 0 N: 0 Y: 2 N: 1 Y: 0 N: 0 Y: 1 N: 1 Y: 1

14 Three Construction NODE ID: 1 1: 37.5% 0: 62.5% Count: 8 Gender ? N Y
               1:   37.5%                0:   62.5%        Count:      8 Gender ? N Y NODE ID:      2              1:   50.0%              0:   50.0%      Count:      4 NODE ID:      3              1:   00.0%              0:  100%      Count:      3 NODE ID:      4              1:   100%              0:   00.0%      Count:      1 Garlic Y N NODE ID:      5              1:   00.0%              0:   100%      Count:      2 NODE ID:      6              1:   100%              0:   00.0%      Count:      2

15 Evaluation

16 Survival in Titanic Sinking
  NODE ID:      1                1:   38.2%                0:   61.8%        Count:   1309 Male with missing Age Female Male Gender   NODE ID:      23                1:   72.7%                0:   27.3%        Count:   466   NODE ID:      24                1:   19.1%                0:   80.9%        Count:   843 Age Age <30.75 or missing ≥30.75 ≥9.5 or missing <9.5   NODE ID:      30                1:   67.8%                0:   32.2%        Count:   314   NODE ID:      31                1:   82.9%                0:   17.1%        Count:   152   NODE ID:      33                1:   58.1%                0:   41.9%        Count:   43   NODE ID:      34                1:   17.0%                0:   83.0%        Count:   800

17 Unequal partition in age
Relationship   NODE ID:      1                1:   38.2%                0:   61.8%        Count:   1309 Female Male Women First Gender   NODE ID:      23                1:   72.7%                0:   27.3%        Count:   466   NODE ID:      24                1:   19.1%                0:   80.9%        Count:   843 Children First Age Age <30.75 or missing ≥30.75 ≥9.5 or missing <9.5   NODE ID:      30                1:   67.8%                0:   32.2%        Count:   314   NODE ID:      31                1:   82.9%                0:   17.1%        Count:   152   NODE ID:      33                1:   58.1%                0:   41.9%        Count:   43   NODE ID:      34                1:   17.0%                0:   83.0%        Count:   800 Unequal partition in age

18 Characteristics of Decision Trees
Successive partitioning results in the presentation of a tree-like visual display with a top node and descendent branches. Branch partitions may be two-way or multiway branches. Partitioning fields may be nominal, ordinal, or interval measurement levels. The final result can be a class or a number

19 Characteristics of Decision Trees
Missing values.- Can be grouped with other values or have their own partition. Symmetry.- Descendent nodes can be balanced and symmetrical, employing a matching set of predictors with each level of the subtree. Asymmetry.- Descendent nodes can be unbalanced in that subnode partitions could be based on the most powerful predictor for each node .

20 Advantages of Decision trees
The created decision trees can detect and visually present contextual effects. There are easy to understand. The resulting model is a white box. Flexibility Cut-points for the same input can be different in each node. Missing values are allowed. Numerical and nominal data can be used as input. Output can be nominal or numerical.

21 Disadvantages of Decision Trees
Deep trees tend to over-fit Poor generalization

22 Multi Trees constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

23 Creation Method The idea is to create multiple uncorrelated trees
Select a random subset of the training set Create a tree Ti with these random subset  When creating the partitions of this tree use only a random subset of the inputs to search for the best input.

24 Evaluation If the output is a class (Classification)
Evaluate the sample in all the multiple trees Each tree votes for one class The selected class is the most voted class If the output is a number (Regression) Final result is the average of each result

25 Extra. Computation of Entropy


Download ppt "Decision trees MARIO REGIN."

Similar presentations


Ads by Google