Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
CHAPTER 9: Decision Trees
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Classification Basic Concepts Decision Trees
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Data Mining Classification This lecture node is modified based on Lecture Notes for Chapter 4/5 of Introduction to Data Mining by Tan, Steinbach, Kumar,
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Classification: Basic Concepts and Decision Trees.
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Lecture outline Classification Decision-tree classification.
Classification and Prediction
CSci 8980: Data Mining (Fall 2002)
Tree-based methods, neutral networks
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
DATA MINING LECTURE 9 Classification Basic Concepts Decision Trees.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Classification supplemental. Scalable Decision Tree Induction Methods in Data Mining Studies SLIQ (EDBT’96 — Mehta et al.) – builds an index for each.
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Classification Basic Concepts, Decision Trees, and Model Evaluation
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.
Review - Decision Trees
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification: Basic Concepts, Decision Trees. Classification: Definition l Given a collection of records (training set ) –Each record contains a set.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
CIS671-Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University AI reminders (based on.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Lecture Notes for Chapter 4 Introduction to Data Mining
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
1 Illustration of the Classification Task: Learning Algorithm Model.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining By Tan, Steinbach,
Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI.
1 By: Ashmi Banerjee (125186) Suman Datta ( ) CSE- 3rd year.
SLIQ and SPRINT for disk resident data. Shortcommings of ID3 Scalability ? requires lot of computation at every stage of construction of decision tree.
DECISION TREES An internal node represents a test on an attribute.
Lecture Notes for Chapter 4 Introduction to Data Mining
Chapter 6 Classification and Prediction
EECS 647: Introduction to Database Systems
Data Mining Classification: Basic Concepts and Techniques
Classification and Prediction
Data Mining: Concepts and Techniques
Basic Concepts and Decision Trees
آبان 96. آبان 96 Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan,
INTRODUCTION TO Machine Learning 2nd Edition
©Jiawei Han and Micheline Kamber
Avoid Overfitting in Classification
Classification.
COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr
Presentation transcript:

Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab /44 Classification: Definition

Bab /44 Example of Classification Task

Bab /44 General Approach for Building Classification Model

Bab /44 Classification Techniques

Bab /44 Example of Decision Tree

Bab /44 Another Example of Decision Tree

Bab /44 Decision Tree Classification Task

Bab /44 Apply Model to Test Data

Bab /44 Decision Tree Classification Task

Bab /44 Decision Tree Induction

Bab /44 General Structure of Hunt’s Algorithm

Bab /44 Hunt’s Algorithm

Bab /44 Design Issues of Decision Tree Induction

Bab /44 Methods for Expression Test Conditions

Bab /44 Test Condition for Nominal Attributes

Bab /44 Test Condition for Ordinal Attributes

Bab /44 Test Condition for Continues Attributes

Bab /44 Splitting Based on Continues Attributes

Bab /44 How to Determine the Best Split / 1

Bab /44 How to Determine the Best Split / 2

Bab /44 Measures of Node Impurity

Bab /44 Finding the Best Split / 1

Bab /44 Finding the Best Split / 2

Bab /44 Measure of Impurity: GINI

Bab /44 Computing GINI Index of a Single Node

Bab /44 Computing GINI Index for a Collection of Nodes

Bab /44 Binary Attributes: Computing GINI Index

Bab /44 Categorical Attributes: Computing GINI Index

Bab /44 Continuous Attributes: Computing GINI Index / 1

Bab /44 Continuous Attributes: Computing GINI Index / 2

Bab /44 Measure of Impurity: Entropy

Bab /44 Computing Entropy of a Single Node

Bab /44 Computing information Gain After Splitting

Bab /44 Problems with Information Gain

Bab /44 Gain Ratio

Bab /44 Measure of Impurity: Classification Error

Bab /44 Computing Error of a Single Node

Bab /44 Comparison among Impurity Measures For binary (2-class) classification problems

Bab /44 Misclassification Error vs Gini index

Bab /44 Example: C4.5 Simple depth-first construction. Uses Information Gain Sorts Continuous Attributes at each node. Needs entire data to fit in memory. Unsuitable for Large Datasets.  Needs out-of-core sorting. You can download the software from:

Bab /44 Scalable Decision Tree Induction / 1 How scalable is decision tree induction?  Particularly suitable for small data set SLIQ (EDBT’96 — Mehta et al.)  Builds an index for each attribute and only class list and the current attribute list reside in memory

Bab /44 Scalable Decision Tree Induction / 2 SLIQ Sample data for the class buys_computer Disk-resident attribute lists Memory-resident class list RIDCredit_ratingAgeBuys_computer 1excellent38yes 2excellent26yes 3fair35no 4excellent49no Credit_ratingRID excellent1 2 4 fair3 …… ageRID …… RIDBuys_computernode 1yes no3 4 6 ………

Bab /44 Decision Tree Based Classification Advantages  Inexpensive to construct  Extremely fast at classifying unknown records  Easy to interpret for small-sized tress  Accuracy is comparable to other classification techniques for many data sets Practical Issues of Classification  Underfitting and Overfitting  Missing Values  Costs of Classification