Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree
Bab /44 Classification: Definition
Bab /44 Example of Classification Task
Bab /44 General Approach for Building Classification Model
Bab /44 Classification Techniques
Bab /44 Example of Decision Tree
Bab /44 Another Example of Decision Tree
Bab /44 Decision Tree Classification Task
Bab /44 Apply Model to Test Data
Bab /44 Decision Tree Classification Task
Bab /44 Decision Tree Induction
Bab /44 General Structure of Hunt’s Algorithm
Bab /44 Hunt’s Algorithm
Bab /44 Design Issues of Decision Tree Induction
Bab /44 Methods for Expression Test Conditions
Bab /44 Test Condition for Nominal Attributes
Bab /44 Test Condition for Ordinal Attributes
Bab /44 Test Condition for Continues Attributes
Bab /44 Splitting Based on Continues Attributes
Bab /44 How to Determine the Best Split / 1
Bab /44 How to Determine the Best Split / 2
Bab /44 Measures of Node Impurity
Bab /44 Finding the Best Split / 1
Bab /44 Finding the Best Split / 2
Bab /44 Measure of Impurity: GINI
Bab /44 Computing GINI Index of a Single Node
Bab /44 Computing GINI Index for a Collection of Nodes
Bab /44 Binary Attributes: Computing GINI Index
Bab /44 Categorical Attributes: Computing GINI Index
Bab /44 Continuous Attributes: Computing GINI Index / 1
Bab /44 Continuous Attributes: Computing GINI Index / 2
Bab /44 Measure of Impurity: Entropy
Bab /44 Computing Entropy of a Single Node
Bab /44 Computing information Gain After Splitting
Bab /44 Problems with Information Gain
Bab /44 Gain Ratio
Bab /44 Measure of Impurity: Classification Error
Bab /44 Computing Error of a Single Node
Bab /44 Comparison among Impurity Measures For binary (2-class) classification problems
Bab /44 Misclassification Error vs Gini index
Bab /44 Example: C4.5 Simple depth-first construction. Uses Information Gain Sorts Continuous Attributes at each node. Needs entire data to fit in memory. Unsuitable for Large Datasets. Needs out-of-core sorting. You can download the software from:
Bab /44 Scalable Decision Tree Induction / 1 How scalable is decision tree induction? Particularly suitable for small data set SLIQ (EDBT’96 — Mehta et al.) Builds an index for each attribute and only class list and the current attribute list reside in memory
Bab /44 Scalable Decision Tree Induction / 2 SLIQ Sample data for the class buys_computer Disk-resident attribute lists Memory-resident class list RIDCredit_ratingAgeBuys_computer 1excellent38yes 2excellent26yes 3fair35no 4excellent49no Credit_ratingRID excellent1 2 4 fair3 …… ageRID …… RIDBuys_computernode 1yes no3 4 6 ………
Bab /44 Decision Tree Based Classification Advantages Inexpensive to construct Extremely fast at classifying unknown records Easy to interpret for small-sized tress Accuracy is comparable to other classification techniques for many data sets Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification