Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Classification and Prediction
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Chapter 7 Decision Tree.
Data Mining: Classification
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Basics of Decision Trees  A flow-chart-like hierarchical tree structure –Often restricted to a binary structure  Root: represents the entire dataset.
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Basic Data Mining Technique
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification CS 685: Special Topics in Data Mining Fall 2010 Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
CS690L Data Mining: Classification
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Classification and Prediction
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Lecture Notes for Chapter 4 Introduction to Data Mining
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Decision Trees.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Decision Tree. Classification Databases are rich with hidden information that can be used for making intelligent decisions. Classification is a form of.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 6 Classification and Prediction
Data Mining Classification: Basic Concepts and Techniques
Classification and Prediction
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
©Jiawei Han and Micheline Kamber
Presentation transcript:

Data Mining Techniques: Classification

Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists of the same set of multiple attributes as the tuples in the large database W additionally, each tuple has a known class identity –Derive the classification mechanism from the training set E, and then use this mechanism to classify general data (in W)

Learning Phase Learning –The class label attribute is credit_rating –Training data are analyzed by a classification algorithm –The classifier is represented in the form of classification rules

Testing Phase Testing (Classification) –Test data are used to estimate the accuracy of the classification rules –If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples

Classification by Decision Tree A top-down decision tree generation algorithm: ID-3 and its extended version C4.5 (Quinlan’93): J.R. Quinlan, C4.5 Programs for Machine Learning, Morgan Kaufmann, 1993

Decision Tree Generation At start, all the training examples are at the root Partition examples recursively based on selected attributes Attribute Selection –Favoring the partitioning which makes the majority of examples belong to a single class Tree Pruning (Overfitting Problem) –Aiming at removing tree branches that may lead to errors when classifying test data Training data may contain noise, …

EyeHairHeightOriental Black ShortYes BlackWhiteTallYes BlackWhiteShortYes Black TallYes BrownBlackTallYes BrownWhiteShortYes BlueGoldTallNo BlueGoldShortNo BlueWhiteTallNo BlueBlackShortNo BrownGoldShortNo Another Examples

Decision Tree

Decision Tree Generation Attribute Selection (Split Criterion) –Information Gain (ID3/C4.5/See5) –Gini Index (CART/IBM Intelligent Miner) –Inference Power These measures are also called goodness functions and used to select the attribute to split at a tree node during the tree generation phase

Decision Tree Generation Branching Scheme –Determining the tree branch to which a sample belongs –Binary vs. K-ary Splitting When to stop the further splitting of a node –Impurity Measure Labeling Rule –A node is labeled as the class to which most samples at the node belongs

Decision Tree Generation Algorithm: ID3 (7.1)  Entropy ID: Iterative Dichotomiser

Decision Tree Algorithm: ID3

yes

Decision Tree Algorithm: ID3

Exercise 2

Decision Tree Generation Algorithm: ID3

How to Use a Tree Directly –Test the attribute value of unknown sample against the tree. –A path is traced from root to a leaf which holds the label Indirectly –Decision tree is converted to classification rules –One rule is created for each path from the root to a leaf –IF-THEN is easier for humans to understand

Generating Classification Rules

There are 4 decision rules are generated by the tree –Watch the game and home team wins and out with friends then bear –Watch the game and home team wins and sitting at home then diet soda –Watch the game and home team loses and out with friend then bear –Watch the game and home team loses and sitting at home then milk Optimization for these rules –Watch the game and out with friends then bear –Watch the game and home team wins and sitting at home then diet soda –Watch the game and home team loses and sitting at home then milk

Decision Tree Generation Algorithm: ID3 All attributes are assumed to be categorical (discretized) Can be modified for continuous-valued attributes –Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals –A  V | A < V Prefer Attributes with Many Values Cannot Handle Missing Attribute Values Attribute dependencies do not consider in this algorithm

Attribute Selection in C4.5

Handling Continuous Attributes

  Sorted By First Cut Second Cut Third Cut

Handling Continuous Attributes Root Price On Date T+1 > Price On Date T+1 <= Price On Date T > Price On Date T <= Price On Date T+1 > Price On Date T+1 <= Buy Sell BuySell First Cut Second Cut Third Cut

Exercise 3: 分析房價 SF : Square Feet IDLocationTypeMilesSFCMHome Price (K) 1UrbanDetached High 2RuralDetached920005Low 3UrbanAttached High 4UrbanDetached High 5RuralDetached Low 6RuralDetached Medium 7RuralDetached Medium 8UrbanAttached High 9RuralDetached Low 10UrbanAttached Medium CM : No. of Homes in Community

Unknown Attribute Values in C4.5 Training Testing

Unknown Attribute Values Adjustment of Attribute Selection Measure

Fill in Approach

Probability Approach

Unknown Attribute Values Partitioning the Training Set

Probability Approach

Unknown Attribute Values Classifying an Unseen Case

Probability Approach

Evaluation – Coincidence Matrix Cost = $190 * (closing good account) + $10 * (keeping bad account open) Accuracy ( 正確率 ) = (36+632) / 718 = 93.0% Precision ( 精確率 ) for Insolvent = 36/58 = 62.01% Recall ( 捕捉率 ) for Insolvent = 36/64 = 56.25% F Measure = 2 * Precision * Recall / (Precision + Recall ) = 2 * 62.01% * 56.25% / (62.01% % ) = 0.7 / = 0.59 Cost = $190 * 22 + $10 * 28 = $4,460 Decision Tree Model

Decision Tree Generation Algorithm: Gini Index If a data set S contains examples from n classes, gini index, gini(S), is defined as where p j is the relative frequency of class C j in S. If a data set S is split into two subsets S 1 and S 2 with sizes N 1 and N 2 respectively, the gini index of the split data contains examples from n classes, the gini index, gini(S), is defined as

Decision Tree Generation Algorithm: Gini Index The attribute provides the smallest ginisplit(S) is chosen to split the node The computation cost of gini index is less than information gain All attributes are binary splitting in IBM Intelligent Miner –A  V | A < V

Decision Tree Generation Algorithm: Inference Power A feature that is useful in inferring the group identity of a data tuple is said to have a good inference power to that group identity. In Table 1, given attributes (features) “Gender”, “Beverage”, “State”, try to find their inference power to “Group id”

Naive Bayesian Classification Each data sample is a n-dim feature vector –X = (x1, x2,.. xn) for attributes A1, A2, … An Suppose there are m classes –C = {C1, C2,.. Cm} The classifier will predict X to the class Ci that has the highest posterior probability, conditioned on X –X belongs to Ci iff P(Ci|X) > P(Cj|X) for all 1<=j<=m, j!=i

Naive Bayesian Classification P(Ci|X) = P(X|Ci) P(Ci) / P(X) –P(Ci|X) = P(Ci ∪ X) / P(X) ; P(X|Ci) = P(Ci ∪ X) / P(Ci) => P(Ci|X) P(X) = P(X|Ci) P(Ci) P(Ci) = si / s –si is the number of training sample of class Ci –s is the total number of training samples Assumption: Independent between Attributes –P(X|Ci) = P(x1|Ci) P(x2|Ci) P(x3|Ci)... P(xn|Ci) P(X) can be ignored

Naive Bayesian Classification Classify X=(age=“<=30”, income=“medium”, student=“yes”, credit-rating=“fair”) –P(buys_computer=yes) = 9/14 –P(buys_computer=no)=5/14 –P(age=<30|buys_computer=yes)=2/9 –P(age=<30|buys_computer=no)=3/5 –P(income=medium|buys_computer=yes)=4/9 –P(income=medium|buys_computer=no)=2/5 –P(student=yes|buys_computer=yes)=6/9 –P(student=yes|buys_computer=no)=1/5 –P(credit-rating=fair|buys_computer=yes)=6/9 –P(credit-rating =fair|buys_computer=no)=2/5 –P(X|buys_computer=yes)=0.044 –P(X|buys_computer=no)=0.019 –P(buys_computer=yes|X) = P(X|buys_computer=yes) P(buys_computer=yes)=0.028 –P(buys_computer=no|X) = P(X|buys_computer=no) P(buys_computer=no)=0.007

Homework Assignment