Download presentation
Presentation is loading. Please wait.
1
CS690L Data Mining: Classification
Reference: J. Han and M. Kamber, Data Mining: Concepts and Techniques Yong Fu:
2
Classification Classification
determine the class or category of an object based on its properties Two stages Learning stage: construction of a classification function or model Classification stage: prediction of classes of objects using the function or model Tools for classification Decision tree Bayesian networks Neural networks Regression Problem Given a set of objects whose classes are known called training set derive a classification model which can correctly classify future objects
3
Classification: Decision Tree
Classification model: decision tree Method: Top Down Induction of Decision Trees Data representation: Every object is represented by a vector of values on a fixed set of attributes. If a relation is defined on the attributes an object is a tuple in the relation. A special attribute called class attribute tells the group/category the object belongs to which is the dependent attribute to be predicted Learning stage: Induction of a decision tree that classifies the training set Classification stage: The decision tree will classify new objects.
4
An Example Definitions
A decision tree is a tree in which each non-leaf node corresponds to an attribute of objects and each branch from a non-leaf node to its children represents a value of the attribute. Each leaf node in a decision tree is labeled by a class of the objects Classification using decision trees Starting from the root an object follows the path to a leaf node which gives the class of the object taking branches according to its values along the way Alternative view of decision tree Node/Branch: discrimination test Node: subset of objects satisfying test
5
Decision Tree Induction
Induction of decision trees: Starting from a training set recursively selecting attributes to split nodes thus partitioning the objects Termination condition: when to stop splitting a node Selection of attribute for splitting testing: Best split A measure for splitting? ID3 algorithm Selection: attribute information gain Termination condition: all objects are in a single class
6
ID3 Algorithm
7
ID3 Algorithm (Cont)
8
Example
9
Example: Decision Tree Building
Information content of C (Expected information for the classification) I(P) = Ent(C)= - {(9/14) log2 ( 9/14) + (5/14)log2 (5/14)} = 0.940 For each Attribute Ai Step 1: Compute the entropy for a given attribute Ai Ent(Sunny) = - ((2/5 log2 2/5) + (3/5 log2 3/5)) = 0.97 Ent(Rainy) = 0.97 Ent(Overcast) = 0 Step 2: Compute the Entropy (expected information based on the partitioning into Subsets by A) Ent(C, Outlook) = (5/14)Ent(Sunny) + (5/14)Ent(Rainy) + (4/14)Ent(Overcast) = (5/14)(0.97) + (5/14)(0.97) + (4/14)(0) = 0.69 Step 3: Gain(C, Outlook) = Ent(C) – Ent(C, Outlook) = – 0.69 = 0.25 Select the attribute that maximize information gain Build a node for the selected attribute Recursively build nodes.
10
Level1: Decision Tree Building
Outlook Overcast Rainy Sunny Temp Hum Wind Class False P True P False P False P Temp Hum Wind Class False DP True DP False DP False P True P Temp Hum Wind Class False P 80 False P 70 True DP 80 False P True DP
11
Decision Tree
12
Generated Rules
13
ID3 Algorithm
14
C4.5 Extensions to ID3 Gain ratio: Gain favors attributes with many values GainRatio (C, A) = Gain(C, A)/Ent(P) where P = (|T1|/|C|, |T2|/C, … |Tm|/|C|) and Ti are partitions of C based on object’s value of A. e.g. GainRatio (Outlook) = Gain(Outlook)/ {(5/14) log2 (5/14) + (5/14)log2 (5/14) + (4/14)log2 (4/14) } Missing values: consider only objects without the attribute is defined. Continuous attributes: consider all binary splits A <= ai and A > ai where ai is the ith values of A. compute the gain or gain ratio and choose the split that maximizes the gain or gain ratio Over-fitting: Change the termination condition. If a subtree is dominated by a class stop splitting Tree pruning: replacing a subtree by a single leaf node. When the expected classification error can be reduced Rule deriving: A rule basically corresponds to a path from root to a leaf The LHS is the conjunction of testing and the RHS is the class prediction Rule simplification: removing some conditions in the LHS
15
Evaluation of Decision Tree Methods
Complexity Expressive power Robustness Effectiveness
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.