Data Mining: Classification

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Classification and Prediction
Spatial and Temporal Data Mining V. Megalooikonomou Introduction to Decision Trees ( based on notes by Jiawei Han and Micheline Kamber and on notes by.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
Decision Trees an Introduction.
Three kinds of learning
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Gini Index (IBM IntelligentMiner)
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Decision Tree Learning
Chapter 7 Decision Tree.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Basics of Decision Trees  A flow-chart-like hierarchical tree structure –Often restricted to a binary structure  Root: represents the entire dataset.
Chapter 9 – Classification and Regression Trees
Feature Selection: Why?
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification CS 685: Special Topics in Data Mining Fall 2010 Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
CS690L Data Mining: Classification
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Classification and Prediction
CIS671-Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University AI reminders (based on.
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Decision Trees.
Classification and Regression Trees
Decision Tree. Classification Databases are rich with hidden information that can be used for making intelligent decisions. Classification is a form of.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
10. Decision Trees and Markov Chains for Gene Finding.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Classification and Prediction
Chapter 8 Tutorial.
Classification by Decision Tree Induction
©Jiawei Han and Micheline Kamber
Presentation transcript:

Data Mining: Classification

Classification What is Classification? Classifying tuples in a database In training set E each tuple consists of the same set of multiple attributes as the tuples in the large database W additionally, each tuple has a known class identity Derive the classification mechanism from the training set E, and then use this mechanism to classify general data (in W)

Learning Phase Learning Training data are analyzed by a classification algorithm The class label attribute is credit_rating The classifier is represented in the form of classification rules

Testing Phase Testing (Classification) Test data are used to estimate the accuracy of the classification rules If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples

Classification by Decision Tree A top-down decision tree generation algorithm: ID-3 and its extended version C4.5 (Quinlan’93): J.R. Quinlan, C4.5 Programs for Machine Learning, Morgan Kaufmann, 1993

Decision Tree Generation At start, all the training examples are at the root Partition examples recursively based on selected attributes Attribute Selection Favoring the partitioning which makes the majority of examples belong to a single class Tree Pruning (Overfitting Problem) Aiming at removing tree branches that may lead to errors when classifying test data Training data may contain noise, …

Another Examples Eye Hair Height Oriental Black Short Yes White Tall Brown Blue Gold No 1 2 3 4 5 6 7 8 9 10 11

After the analysis, can you classify the following patterns? (Black, Gold, Tall) (Blue, White, Short) Example distributions Black Short Black Tall White Short White Tall Gold Short Gold Tall Black + ? Brown ─ Blue

Decision Tree

Decision Tree

Decision Tree Generation Attribute Selection (Split Criterion) Information Gain (ID3/C4.5/See5) Gini Index (CART/IBM Intelligent Miner) Inference Power These measures are also called goodness functions and used to select the attribute to split at a tree node during the tree generation phase

Decision Tree Generation Branching Scheme Determining the tree branch to which a sample belongs Binary vs. K-ary Splitting When to stop the further splitting of a node Impurity Measure Labeling Rule A node is labeled as the class to which most samples at the node belongs

Decision Tree Generation Algorithm: ID3 ID: Iterative Dichotomiser (7.1)  Entropy

Decision Tree Algorithm: ID3

Decision Tree Algorithm: ID3

Decision Tree Algorithm: ID3

Decision Tree Algorithm: ID3 yes

Decision Tree Algorithm: ID3

Another Example

Another Example

Decision Tree Generation Algorithm: ID3

Decision Tree Generation Algorithm: ID3

Decision Tree Generation Algorithm: ID3

Gini Index If a data set T contains examples from n classes, gini index, gini(T), is defined as where pj is the relative frequency of class j in T. If a data set T is split into two subsets T1 and T2 with sizes N1 and N2 respectively, the gini index of the split data contains examples from n classes, the gini index, gini(T), is defined as

Inference Power of an Attribute A feature that is useful in inferring the group identity of a data tuple is said to have a good inference power to that group identity. In Table 1, given attributes (features) “Gender”, “Beverage”, “State”, try to find their inference power to “Group id”

Generating Classification Rules