Presentation is loading. Please wait.

Presentation is loading. Please wait.

Non-Metric Methods: Decision Trees Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005.

Similar presentations


Presentation on theme: "Non-Metric Methods: Decision Trees Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005."— Presentation transcript:

1 Non-Metric Methods: Decision Trees Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005

2 Decision Trees  Motivation: There are features (discrete) that don’t have an obvious notion of similarity or ordering (nominal data), e.g., book type, shape, sound type  Taxonomies (i.e., trees with is-a relationship) are the oldest form of classification

3 Decision Trees: Definition  Decision Trees are classifiers that classify samples based on a set of questions that are asked hierarchically (tree of questions)  Example questions is color red? is x < 0.5?  Terminology: root, leaf, node, arc, branch, parent, children, branching factor, depth

4 Fruit classifier Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour

5 Fruit classification Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour CHERRY

6 Fruit classification Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour CHERRY

7 Fruit classification Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour CHERRY

8 Fruit classification Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour CHERRY

9 Fruit classifier Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour watermelon grape grapefruit cherrygrape

10 Binary Trees  Binary trees: each parent node has exactly two children nodes (branching factor = 2)  Any tree can be represented as a binary tree by changing set of questions and by increasing the tree depth  e.g., Color? green yellow red Color = green? Color = yellow? YN Y N

11 Decision Trees: Problems 1.List of questions (features) All possible questions are considered 2.Which questions to split first (best split) The questions that split the data best (reduce impurity at each node) are asked first 3.Stopping criteria (pruning criteria) Stop when further splits don’t reduce imprurity

12 Best Split example  Two class problem with 100 examples from w1 and w2  Three binary questions Q1, Q2 and Q3 that split the data as follows: 1. Node 1: (50,50)Node 2: (50,50) 2. Node 1: (100,0)Node 2: (0,100) 3. Node 1: (80,0)Node 2: (20,100)

13 Impurity Measures  Impurity measures the degree of homogeneity of a node; a node is pure if it consists of training examples from a single class  Impurity Measures Entropy Impurity: i(N) = -  i P(w i ) log 2 (P(w i )) Variance (two-class): i(N) = P(w 1 ) P(w 2 ) Gini Impurity: i(N) = 1-  i P 2 (w i ) Misclassification: i(N) = 1- max i P(w i )

14 Total Impurity Total Impurity at Depth 0: i(depth =0) = i(N) Total Impurity at Depth 1: i(depth =1) = p(N L ) i(N L ) + p(N R ) i(N R ) N yes no NLNL NRNR Depth 0 Depth 1

15 Impurity Example  Node 1: (80,0)Node 2: (20,100) I(node 1) = 0 I(node 2) = - 20/120 log2(20/120) - 100/120 log2(100/120) = 0.65 P(node 1) = 80/200 = 0.4 P(node 2) = 120/200 = 0.6 I(total) = P(node 1) I(node 1) + P(node 2) I(node 2) = = 0 + 0.6*0.65 = 0.39

16 Continuous Example  For continuous features: questions are of the type x<a where x is the feature and a is a constant  Decision Boundaries (two class, 2-D example): R1 R2 R1 R2 x1 x2

17 Summary  Decision trees are useful categorical classification tools especially for nominal (non-metric) data  CART creates trees that minimize impurity on the training set at each node  Decision region shape  CART is a useful tool for feature selection


Download ppt "Non-Metric Methods: Decision Trees Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005."

Similar presentations


Ads by Google